Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
* Thomas Gleixner <[EMAIL PROTECTED]> wrote: > Hi all, > > On Thu, 2005-08-18 at 08:01 +0200, Ingo Molnar wrote: > > i have released the 2.6.13-rc6-rt9 tree, which can be downloaded from > > the usual place: > > I reworked the code for dynamically setting the priority of the > hrtimer softirq to be aware of PI. looks mostly good to me. I'm uneasy about the fact that we touch p->normal_prio without taking the runqueue lock. Best would be to clean these things up by introducing a wake_up_process_prio() and wake_up_process_chprio() method that does these things embedded in try_to_wake_up(). > I stumbled over "//#define MUTEX_IRQS_OFF" in the first attempt. My > assumption that all code protected by pi_lock (which is a raw lock) is > irq save turned out to be wrong. I missed that commented define :( I > guess it was introduced during the "IRQ latency contest" to squeeze > out the last nsecs :) > > Switching it back on is not really influencing system performance in a > measurable way, but it allows to use the pi aware boosting function in > irq context. > > Quite contrary it makes the system more snappy and the overall test > latencies go down. we can undo that flag - it's indeed only a couple of cycles worth of optimization, which wont count for most workloads. I've applied your patch, but we need to do those cleanups and the fact needs to be documented that !MUTEX_IRQS_OFF is not safe anymore. (most likely this means that the MUTEX_IRQS_OFF flag and all related changes needs to be gotten rid of) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > Quite contrary it makes the system more snappy and the overall test > > latencies go down. > > we can undo that flag - it's indeed only a couple of cycles worth of > optimization, which wont count for most workloads. I've applied your > patch, but we need to do those cleanups and the fact needs to be > documented that !MUTEX_IRQS_OFF is not safe anymore. (most likely this > means that the MUTEX_IRQS_OFF flag and all related changes needs to be > gotten rid of) your patch gets rid of the flag, but not of all side-effects: e.g. trace_local_irq_enable(ti) only takes a 'ti' parameter for !MUTEX_IRQS_OFF - which does not exist anymore. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: On Sat, 2005-08-20 at 18:10 +0400, Oleg Nesterov wrote: posix_timer_event() first checks that the thread (SIGEV_THREAD_ID case) does not have PF_EXITING flag, then it calls send_sigqueue() which locks task list. But if the thread exits in between the kernel will oops. posix_timer_event() runs under k_itimer.it_lock, but this does not help if that thread was not the only one in thread group, in this case we don't call exit_itimers(). I added exit_notify_itimers() for that case. It is synchronized vs. posix_timer_event() via the timer lock and therefor protects against the race described above. It seems to me that exit (or exec) calling the timer release code covered this. I haven't gone through the exact sequence being discussed, however. In any case, I don't think we should send the signal to the group leader instead, but rather should release the timer and cancel any pending signal. AFAIKT there is no reason the group leader can not exit prior to other threads, but I may have missed this... George It solves the problem for the only user of send_sigqueue for now, but I think we should have a more general interface to such functionality to allow simple usage. tglx Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/include/linux/sched.h linux-2.6.13-rc6.work/include/linux/sched.h --- linux-2.6.13-rc6/include/linux/sched.h 2005-08-13 12:25:56.0 +0200 +++ linux-2.6.13-rc6.work/include/linux/sched.h 2005-08-20 17:43:36.0 +0200 @@ -1051,6 +1051,7 @@ extern void __exit_signal(struct task_st extern void exit_sighand(struct task_struct *); extern void __exit_sighand(struct task_struct *); extern void exit_itimers(struct signal_struct *); +extern void exit_notify_itimers(struct signal_struct *); extern NORET_TYPE void do_group_exit(int); diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/posix-timers.c linux-2.6.13-rc6.work/kernel/posix-timers.c --- linux-2.6.13-rc6/kernel/posix-timers.c 2005-08-13 12:25:58.0 +0200 +++ linux-2.6.13-rc6.work/kernel/posix-timers.c 2005-08-20 17:43:36.0 +0200 @@ -1169,6 +1169,33 @@ void exit_itimers(struct signal_struct * } /* + * This is called by __exit_signal, when there are still references to + * the shared signal_struct + */ +void exit_notify_itimers(struct signal_struct *sig) +{ + struct k_itimer *timer; + struct list_head *tmp; + unsigned long flags; + + list_for_each(tmp, &sig->posix_timers) { + + timer = list_entry(tmp, struct k_itimer, list); + + /* Protect against posix_timer_fn */ + spin_lock_irqsave(&timer->it_lock, flags); + if (timer->it_process == current) { + + if (timer->it_sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) + timer->it_sigev_notify = SIGEV_SIGNAL; + + timer->it_process = timer->it_process->group_leader; + } + spin_lock_irqrestore(&timer->it_lock, flags); + } +} + +/* * And now for the "clock" calls * * These functions are called both from timer functions (with the timer diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/signal.c linux-2.6.13-rc6.work/kernel/signal.c --- linux-2.6.13-rc6/kernel/signal.c2005-08-13 12:25:58.0 +0200 +++ linux-2.6.13-rc6.work/kernel/signal.c 2005-08-20 17:57:46.0 +0200 @@ -390,6 +390,7 @@ void __exit_signal(struct task_struct *t sig->nvcsw += tsk->nvcsw; sig->nivcsw += tsk->nivcsw; sig->sched_time += tsk->sched_time; + exit_notify_itimers(sig); spin_unlock(&sighand->siglock); sig = NULL; /* Marker for below. */ } @@ -1381,13 +1382,12 @@ send_sigqueue(int sig, struct sigqueue * int ret = 0; /* -* We need the tasklist lock even for the specific -* thread case (when we don't need to follow the group -* lists) in order to avoid races with "p->sighand" -* going away or changing from under us. + * No need to hold tasklist lock here. + * posix_timer_event() is synchronized via + * exit_itimers() and exit_notify_itimers() to + * be protected against p->sighand == NULL */ BUG_ON(!(q->flags & SIGQUEUE_PREALLOC)); - read_lock(&tasklist_lock); spin_lock_irqsave(&p->sighand->siglock, flags); if (unlikely(!list_empty(&q->list))) { @@ -1414,7 +1414,6 @@ send_sigqueue(int sig, struct sigqueue * out: spin_unlock_irqrestore(&p->sighand->siglock, flags); - read_unlock(&tasklist_lock); return(ret); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo inf
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: > > On Sat, 2005-08-20 at 18:10 +0400, Oleg Nesterov wrote: > > > posix_timer_event() first checks that the thread (SIGEV_THREAD_ID > > case) does not have PF_EXITING flag, then it calls send_sigqueue() > > which locks task list. But if the thread exits in between the kernel > > will oops. > > > posix_timer_event() runs under k_itimer.it_lock, but this does not > > help if that thread was not the only one in thread group, in this > > case we don't call exit_itimers(). > > I added exit_notify_itimers() for that case. It is synchronized vs. > posix_timer_event() via the timer lock and therefor protects against the > race described above. > > It solves the problem for the only user of send_sigqueue for now, but I > think we should have a more general interface to such functionality to > allow simple usage. > > tglx > > Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> > > diff -uprN --exclude-from=/usr/local/bin/diffit.exclude > linux-2.6.13-rc6/include/linux/sched.h > linux-2.6.13-rc6.work/include/linux/sched.h > --- linux-2.6.13-rc6/include/linux/sched.h 2005-08-13 12:25:56.0 > +0200 > +++ linux-2.6.13-rc6.work/include/linux/sched.h 2005-08-20 17:43:36.0 > +0200 > @@ -1051,6 +1051,7 @@ extern void __exit_signal(struct task_st > extern void exit_sighand(struct task_struct *); > extern void __exit_sighand(struct task_struct *); > extern void exit_itimers(struct signal_struct *); > +extern void exit_notify_itimers(struct signal_struct *); > > extern NORET_TYPE void do_group_exit(int); > > diff -uprN --exclude-from=/usr/local/bin/diffit.exclude > linux-2.6.13-rc6/kernel/posix-timers.c > linux-2.6.13-rc6.work/kernel/posix-timers.c > --- linux-2.6.13-rc6/kernel/posix-timers.c 2005-08-13 12:25:58.0 > +0200 > +++ linux-2.6.13-rc6.work/kernel/posix-timers.c 2005-08-20 17:43:36.0 > +0200 > @@ -1169,6 +1169,33 @@ void exit_itimers(struct signal_struct * > } > > /* > + * This is called by __exit_signal, when there are still references to > + * the shared signal_struct > + */ > +void exit_notify_itimers(struct signal_struct *sig) > +{ > + struct k_itimer *timer; > + struct list_head *tmp; > + unsigned long flags; > + > + list_for_each(tmp, &sig->posix_timers) { > + > + timer = list_entry(tmp, struct k_itimer, list); > + > + /* Protect against posix_timer_fn */ > + spin_lock_irqsave(&timer->it_lock, flags); I think this is deadlockable. We already (release_task) hold tasklist_lock here. What if this timer has no SIGEV_THREAD_ID ? posix_timer_fn locks timer->it_lock first, then locks tasklist_lock in send_group_sigqueue(). Also, sys_timer_delete() locks ->it_lock, then ->sighand.lock. I think the better fix would be re-check ->sighand in send_sigqueue, like Paul's patches do. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
On Sat, 2005-08-20 at 18:10 +0400, Oleg Nesterov wrote: > posix_timer_event() first checks that the thread (SIGEV_THREAD_ID > case) does not have PF_EXITING flag, then it calls send_sigqueue() > which locks task list. But if the thread exits in between the kernel > will oops. > posix_timer_event() runs under k_itimer.it_lock, but this does not > help if that thread was not the only one in thread group, in this > case we don't call exit_itimers(). I added exit_notify_itimers() for that case. It is synchronized vs. posix_timer_event() via the timer lock and therefor protects against the race described above. It solves the problem for the only user of send_sigqueue for now, but I think we should have a more general interface to such functionality to allow simple usage. tglx Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/include/linux/sched.h linux-2.6.13-rc6.work/include/linux/sched.h --- linux-2.6.13-rc6/include/linux/sched.h 2005-08-13 12:25:56.0 +0200 +++ linux-2.6.13-rc6.work/include/linux/sched.h 2005-08-20 17:43:36.0 +0200 @@ -1051,6 +1051,7 @@ extern void __exit_signal(struct task_st extern void exit_sighand(struct task_struct *); extern void __exit_sighand(struct task_struct *); extern void exit_itimers(struct signal_struct *); +extern void exit_notify_itimers(struct signal_struct *); extern NORET_TYPE void do_group_exit(int); diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/posix-timers.c linux-2.6.13-rc6.work/kernel/posix-timers.c --- linux-2.6.13-rc6/kernel/posix-timers.c 2005-08-13 12:25:58.0 +0200 +++ linux-2.6.13-rc6.work/kernel/posix-timers.c 2005-08-20 17:43:36.0 +0200 @@ -1169,6 +1169,33 @@ void exit_itimers(struct signal_struct * } /* + * This is called by __exit_signal, when there are still references to + * the shared signal_struct + */ +void exit_notify_itimers(struct signal_struct *sig) +{ + struct k_itimer *timer; + struct list_head *tmp; + unsigned long flags; + + list_for_each(tmp, &sig->posix_timers) { + + timer = list_entry(tmp, struct k_itimer, list); + + /* Protect against posix_timer_fn */ + spin_lock_irqsave(&timer->it_lock, flags); + if (timer->it_process == current) { + + if (timer->it_sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) + timer->it_sigev_notify = SIGEV_SIGNAL; + + timer->it_process = timer->it_process->group_leader; + } + spin_lock_irqrestore(&timer->it_lock, flags); + } +} + +/* * And now for the "clock" calls * * These functions are called both from timer functions (with the timer diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/signal.c linux-2.6.13-rc6.work/kernel/signal.c --- linux-2.6.13-rc6/kernel/signal.c2005-08-13 12:25:58.0 +0200 +++ linux-2.6.13-rc6.work/kernel/signal.c 2005-08-20 17:57:46.0 +0200 @@ -390,6 +390,7 @@ void __exit_signal(struct task_struct *t sig->nvcsw += tsk->nvcsw; sig->nivcsw += tsk->nivcsw; sig->sched_time += tsk->sched_time; + exit_notify_itimers(sig); spin_unlock(&sighand->siglock); sig = NULL; /* Marker for below. */ } @@ -1381,13 +1382,12 @@ send_sigqueue(int sig, struct sigqueue * int ret = 0; /* -* We need the tasklist lock even for the specific -* thread case (when we don't need to follow the group -* lists) in order to avoid races with "p->sighand" -* going away or changing from under us. +* No need to hold tasklist lock here. +* posix_timer_event() is synchronized via +* exit_itimers() and exit_notify_itimers() to +* be protected against p->sighand == NULL */ BUG_ON(!(q->flags & SIGQUEUE_PREALLOC)); - read_lock(&tasklist_lock); spin_lock_irqsave(&p->sighand->siglock, flags); if (unlikely(!list_empty(&q->list))) { @@ -1414,7 +1414,6 @@ send_sigqueue(int sig, struct sigqueue * out: spin_unlock_irqrestore(&p->sighand->siglock, flags); - read_unlock(&tasklist_lock); return(ret); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: > > send_sigqueue is called from posix_timer_fn() and acquires > tasklist_lock, which makes no sense to me. > > send_sigqueue()s (l)onl(e)y user is the posix_timer function > (posix_timer_fn(), calling posix_timer_event()). > > Each posix timer blocks the task from vanishing away by > get_task_struct(), which is protected by the held tasklist_lock. > > The task can neither go away nor the signal handler can change until > put_task_struct() is called inside release_posix_timer(), which removes > any chance to do an invalid access to either task or sighand because the > relevant timer is deleted before the call to put_task_struct(). Also > this call is protected by tasklist_lock(). Yes, the task_struct can't go away, but if process exited this task_struct is just chunk of garbage. I think the intent was to protect against this case. However, I agree with you, locking the tasklist_lock can't help, and the code is wrong. posix_timer_event() first checks that the thread (SIGEV_THREAD_ID case) does not have PF_EXITING flag, then it calls send_sigqueue() which locks task list. But if the thread exits in between the kernel will oops. posix_timer_event() runs under k_itimer.it_lock, but this does not help if that thread was not the only one in thread group, in this case we don't call exit_itimers(). The comment is wrong too. ->sighand can't change, we are clearing posix timer on exec, and tasklist can't prevent ->sighand from going away.. Ingo, Roland, George, am I wrong? Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: ~ 2. Drift of cyclic timers (armed by set_timer()): Due to rounding errors and the drift adjustment code, the fixed increment which is precalculated when the timer is set up and added on rearm, I see creeping deviation from the timeline. I have a patch lined up to base the rearm on human (nsac) units, so this effect will go away. But this is waste of time until (1.) is not solved. George ??? Could I (we) see what you have in mind? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: George, On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote: 2. Drift of cyclic timers (armed by set_timer()): Due to rounding errors and the drift adjustment code, the fixed increment which is precalculated when the timer is set up and added on rearm, I see creeping deviation from the timeline. I have a patch lined up to base the rearm on human (nsac) units, so this effect will go away. But this is waste of time until (1.) is not solved. George ??? Could I (we) see what you have in mind? Nothing which applies clean at the moment and I have no access to the box where the patch floats around. It's simply explained. Current code: set_timer() calc interval->jiffies / interval->arch_cycles; based on it.interval rearm() timer->expires += interval->jiffies; timer->arch_cycle_expires += interval->arch_cycles; normalize(timer); Patched code: set_timer() timer.interval = it.interval; timer.next_expire = it.value; both stored as timespec rearm() next_expire += interval; calc timer->expires/arch_cycle_expires; So on each rearm we eliminate the rounding errors and take the drift adjustment into account. It adds some calculation overhead to each rearm, but I think the standard was written to eliminate the need for this. The notion is that we have a resolution which we use in the calculations so while there may be drift WRT his request, there should be no drift WRT the requested value rounded up to the next resolution. Still, if we can't keep that resolution in arch_cycles... On another issue along this line, I have been thinking of changing the x86 TSC arch cycle size to 1ns. (NOT the resolution, the units for the arch cycle.) The reason to do this is to correctly track changes in cpu frequency as it is today, we would need to track down and update all pending HR timers when ever the frequency changed. By using a common unit all we need to do is change the conversion constants (well I guess they would not be constants any more :). I REALLY don't want to do this as it does add conversion overhead, but I can not think of another clean way to track TSC frequency changes. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
George, On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote: > > 2. Drift of cyclic timers (armed by set_timer()): > > > > Due to rounding errors and the drift adjustment code, the fixed > > increment which is precalculated when the timer is set up and added on > > rearm, I see creeping deviation from the timeline. > > > > I have a patch lined up to base the rearm on human (nsac) units, so this > > effect will go away. But this is waste of time until (1.) is not solved. > > > > George ??? > > Could I (we) see what you have in mind? Nothing which applies clean at the moment and I have no access to the box where the patch floats around. It's simply explained. Current code: set_timer() calc interval->jiffies / interval->arch_cycles; based on it.interval rearm() timer->expires += interval->jiffies; timer->arch_cycle_expires += interval->arch_cycles; normalize(timer); Patched code: set_timer() timer.interval = it.interval; timer.next_expire = it.value; both stored as timespec rearm() next_expire += interval; calc timer->expires/arch_cycle_expires; So on each rearm we eliminate the rounding errors and take the drift adjustment into account. It adds some calculation overhead to each rearm, but tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/