Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-22 Thread Ingo Molnar

* Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> On Thu, 2005-08-18 at 08:01 +0200, Ingo Molnar wrote:
> > i have released the 2.6.13-rc6-rt9 tree, which can be downloaded from 
> > the usual place:
> 
> I reworked the code for dynamically setting the priority of the 
> hrtimer softirq to be aware of PI.

looks mostly good to me. I'm uneasy about the fact that we touch 
p->normal_prio without taking the runqueue lock. Best would be to clean 
these things up by introducing a wake_up_process_prio() and 
wake_up_process_chprio() method that does these things embedded in 
try_to_wake_up().

> I stumbled over "//#define MUTEX_IRQS_OFF" in the first attempt. My 
> assumption that all code protected by pi_lock (which is a raw lock) is 
> irq save turned out to be wrong. I missed that commented define :( I 
> guess it was introduced during the "IRQ latency contest" to squeeze 
> out the last nsecs :)
>
> Switching it back on is not really influencing system performance in a 
> measurable way, but it allows to use the pi aware boosting function in 
> irq context.
> 
> Quite contrary it makes the system more snappy and the overall test 
> latencies go down.

we can undo that flag - it's indeed only a couple of cycles worth of 
optimization, which wont count for most workloads. I've applied your 
patch, but we need to do those cleanups and the fact needs to be 
documented that !MUTEX_IRQS_OFF is not safe anymore. (most likely this 
means that the MUTEX_IRQS_OFF flag and all related changes needs to be 
gotten rid of)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-22 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > Quite contrary it makes the system more snappy and the overall test 
> > latencies go down.
> 
> we can undo that flag - it's indeed only a couple of cycles worth of 
> optimization, which wont count for most workloads. I've applied your 
> patch, but we need to do those cleanups and the fact needs to be 
> documented that !MUTEX_IRQS_OFF is not safe anymore. (most likely this 
> means that the MUTEX_IRQS_OFF flag and all related changes needs to be 
> gotten rid of)

your patch gets rid of the flag, but not of all side-effects: e.g.  
trace_local_irq_enable(ti) only takes a 'ti' parameter for 
!MUTEX_IRQS_OFF - which does not exist anymore.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-22 Thread George Anzinger

Thomas Gleixner wrote:

On Sat, 2005-08-20 at 18:10 +0400, Oleg Nesterov wrote:



posix_timer_event() first checks that the thread (SIGEV_THREAD_ID
case) does not have PF_EXITING flag, then it calls send_sigqueue()
which locks task list. But if the thread exits in between the kernel
will oops.




posix_timer_event() runs under k_itimer.it_lock, but this does not
help if that thread was not the only one in thread group, in this
case we don't call exit_itimers().



I added exit_notify_itimers() for that case. It is synchronized vs.
posix_timer_event() via the timer lock and therefor protects against the
race described above.


It seems to me that exit (or exec) calling the timer release code 
covered this.  I haven't gone through the exact sequence being 
discussed, however.  In any case, I don't think we should send the 
signal to the group leader instead, but rather should release the timer 
and cancel any pending signal.  AFAIKT there is no reason the group 
leader can not exit prior to other threads, but I may have missed this...


George


It solves the problem for the only user of send_sigqueue for now, but I
think we should have a more general interface to such functionality to
allow simple usage.

tglx

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>


diff -uprN --exclude-from=/usr/local/bin/diffit.exclude 
linux-2.6.13-rc6/include/linux/sched.h 
linux-2.6.13-rc6.work/include/linux/sched.h
--- linux-2.6.13-rc6/include/linux/sched.h  2005-08-13 12:25:56.0 
+0200
+++ linux-2.6.13-rc6.work/include/linux/sched.h 2005-08-20 17:43:36.0 
+0200
@@ -1051,6 +1051,7 @@ extern void __exit_signal(struct task_st
 extern void exit_sighand(struct task_struct *);
 extern void __exit_sighand(struct task_struct *);
 extern void exit_itimers(struct signal_struct *);
+extern void exit_notify_itimers(struct signal_struct *);
 
 extern NORET_TYPE void do_group_exit(int);
 
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude linux-2.6.13-rc6/kernel/posix-timers.c linux-2.6.13-rc6.work/kernel/posix-timers.c

--- linux-2.6.13-rc6/kernel/posix-timers.c  2005-08-13 12:25:58.0 
+0200
+++ linux-2.6.13-rc6.work/kernel/posix-timers.c 2005-08-20 17:43:36.0 
+0200
@@ -1169,6 +1169,33 @@ void exit_itimers(struct signal_struct *
 }
 
 /*

+ * This is called by __exit_signal, when there are still references to
+ * the shared signal_struct
+ */
+void exit_notify_itimers(struct signal_struct *sig)
+{
+   struct k_itimer *timer;
+   struct list_head *tmp;
+   unsigned long flags;
+
+   list_for_each(tmp, &sig->posix_timers) {
+
+   timer = list_entry(tmp, struct k_itimer, list);
+
+   /* Protect against posix_timer_fn */
+   spin_lock_irqsave(&timer->it_lock, flags);
+   if (timer->it_process == current) {
+   
+   if (timer->it_sigev_notify == 
(SIGEV_SIGNAL|SIGEV_THREAD_ID))
+   timer->it_sigev_notify = SIGEV_SIGNAL;
+   
+   timer->it_process = timer->it_process->group_leader;
+		} 
+		spin_lock_irqrestore(&timer->it_lock, flags);

+   }
+}
+
+/*
  * And now for the "clock" calls
  *
  * These functions are called both from timer functions (with the timer
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude 
linux-2.6.13-rc6/kernel/signal.c linux-2.6.13-rc6.work/kernel/signal.c
--- linux-2.6.13-rc6/kernel/signal.c2005-08-13 12:25:58.0 +0200
+++ linux-2.6.13-rc6.work/kernel/signal.c   2005-08-20 17:57:46.0 
+0200
@@ -390,6 +390,7 @@ void __exit_signal(struct task_struct *t
sig->nvcsw += tsk->nvcsw;
sig->nivcsw += tsk->nivcsw;
sig->sched_time += tsk->sched_time;
+   exit_notify_itimers(sig);
spin_unlock(&sighand->siglock);
sig = NULL; /* Marker for below.  */
}
@@ -1381,13 +1382,12 @@ send_sigqueue(int sig, struct sigqueue *
int ret = 0;
 
 	/*

-* We need the tasklist lock even for the specific
-* thread case (when we don't need to follow the group
-* lists) in order to avoid races with "p->sighand"
-* going away or changing from under us.
+	 * No need to hold tasklist lock here. 
+	 * posix_timer_event() is synchronized via 
+	 * exit_itimers() and exit_notify_itimers() to 
+	 * be protected against p->sighand == NULL

 */
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
-	read_lock(&tasklist_lock);  
 	spin_lock_irqsave(&p->sighand->siglock, flags);


if (unlikely(!list_empty(&q->list))) {
@@ -1414,7 +1414,6 @@ send_sigqueue(int sig, struct sigqueue *
 
 out:

spin_unlock_irqrestore(&p->sighand->siglock, flags);
-   read_unlock(&tasklist_lock);
return(ret);
 }
 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo inf

Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread Oleg Nesterov
Thomas Gleixner wrote:
> 
> On Sat, 2005-08-20 at 18:10 +0400, Oleg Nesterov wrote:
> 
> > posix_timer_event() first checks that the thread (SIGEV_THREAD_ID
> > case) does not have PF_EXITING flag, then it calls send_sigqueue()
> > which locks task list. But if the thread exits in between the kernel
> > will oops.
> 
> > posix_timer_event() runs under k_itimer.it_lock, but this does not
> > help if that thread was not the only one in thread group, in this
> > case we don't call exit_itimers().
> 
> I added exit_notify_itimers() for that case. It is synchronized vs.
> posix_timer_event() via the timer lock and therefor protects against the
> race described above.
> 
> It solves the problem for the only user of send_sigqueue for now, but I
> think we should have a more general interface to such functionality to
> allow simple usage.
> 
> tglx
> 
> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
> 
> diff -uprN --exclude-from=/usr/local/bin/diffit.exclude 
> linux-2.6.13-rc6/include/linux/sched.h 
> linux-2.6.13-rc6.work/include/linux/sched.h
> --- linux-2.6.13-rc6/include/linux/sched.h  2005-08-13 12:25:56.0 
> +0200
> +++ linux-2.6.13-rc6.work/include/linux/sched.h 2005-08-20 17:43:36.0 
> +0200
> @@ -1051,6 +1051,7 @@ extern void __exit_signal(struct task_st
>  extern void exit_sighand(struct task_struct *);
>  extern void __exit_sighand(struct task_struct *);
>  extern void exit_itimers(struct signal_struct *);
> +extern void exit_notify_itimers(struct signal_struct *);
> 
>  extern NORET_TYPE void do_group_exit(int);
> 
> diff -uprN --exclude-from=/usr/local/bin/diffit.exclude 
> linux-2.6.13-rc6/kernel/posix-timers.c 
> linux-2.6.13-rc6.work/kernel/posix-timers.c
> --- linux-2.6.13-rc6/kernel/posix-timers.c  2005-08-13 12:25:58.0 
> +0200
> +++ linux-2.6.13-rc6.work/kernel/posix-timers.c 2005-08-20 17:43:36.0 
> +0200
> @@ -1169,6 +1169,33 @@ void exit_itimers(struct signal_struct *
>  }
> 
>  /*
> + * This is called by __exit_signal, when there are still references to
> + * the shared signal_struct
> + */
> +void exit_notify_itimers(struct signal_struct *sig)
> +{
> +   struct k_itimer *timer;
> +   struct list_head *tmp;
> +   unsigned long flags;
> +
> +   list_for_each(tmp, &sig->posix_timers) {
> +
> +   timer = list_entry(tmp, struct k_itimer, list);
> +
> +   /* Protect against posix_timer_fn */
> +   spin_lock_irqsave(&timer->it_lock, flags);

I think this is deadlockable. We already (release_task) hold
tasklist_lock here. What if this timer has no SIGEV_THREAD_ID ?

posix_timer_fn locks timer->it_lock first, then locks tasklist_lock
in send_group_sigqueue().

Also, sys_timer_delete() locks ->it_lock, then ->sighand.lock.

I think the better fix would be re-check ->sighand in send_sigqueue,
like Paul's patches do.

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread Thomas Gleixner
On Sat, 2005-08-20 at 18:10 +0400, Oleg Nesterov wrote:

> posix_timer_event() first checks that the thread (SIGEV_THREAD_ID
> case) does not have PF_EXITING flag, then it calls send_sigqueue()
> which locks task list. But if the thread exits in between the kernel
> will oops.

> posix_timer_event() runs under k_itimer.it_lock, but this does not
> help if that thread was not the only one in thread group, in this
> case we don't call exit_itimers().

I added exit_notify_itimers() for that case. It is synchronized vs.
posix_timer_event() via the timer lock and therefor protects against the
race described above.

It solves the problem for the only user of send_sigqueue for now, but I
think we should have a more general interface to such functionality to
allow simple usage.

tglx

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>


diff -uprN --exclude-from=/usr/local/bin/diffit.exclude 
linux-2.6.13-rc6/include/linux/sched.h 
linux-2.6.13-rc6.work/include/linux/sched.h
--- linux-2.6.13-rc6/include/linux/sched.h  2005-08-13 12:25:56.0 
+0200
+++ linux-2.6.13-rc6.work/include/linux/sched.h 2005-08-20 17:43:36.0 
+0200
@@ -1051,6 +1051,7 @@ extern void __exit_signal(struct task_st
 extern void exit_sighand(struct task_struct *);
 extern void __exit_sighand(struct task_struct *);
 extern void exit_itimers(struct signal_struct *);
+extern void exit_notify_itimers(struct signal_struct *);
 
 extern NORET_TYPE void do_group_exit(int);
 
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude 
linux-2.6.13-rc6/kernel/posix-timers.c 
linux-2.6.13-rc6.work/kernel/posix-timers.c
--- linux-2.6.13-rc6/kernel/posix-timers.c  2005-08-13 12:25:58.0 
+0200
+++ linux-2.6.13-rc6.work/kernel/posix-timers.c 2005-08-20 17:43:36.0 
+0200
@@ -1169,6 +1169,33 @@ void exit_itimers(struct signal_struct *
 }
 
 /*
+ * This is called by __exit_signal, when there are still references to
+ * the shared signal_struct
+ */
+void exit_notify_itimers(struct signal_struct *sig)
+{
+   struct k_itimer *timer;
+   struct list_head *tmp;
+   unsigned long flags;
+
+   list_for_each(tmp, &sig->posix_timers) {
+
+   timer = list_entry(tmp, struct k_itimer, list);
+
+   /* Protect against posix_timer_fn */
+   spin_lock_irqsave(&timer->it_lock, flags);
+   if (timer->it_process == current) {
+   
+   if (timer->it_sigev_notify == 
(SIGEV_SIGNAL|SIGEV_THREAD_ID))
+   timer->it_sigev_notify = SIGEV_SIGNAL;
+   
+   timer->it_process = timer->it_process->group_leader;
+   } 
+   spin_lock_irqrestore(&timer->it_lock, flags);
+   }
+}
+
+/*
  * And now for the "clock" calls
  *
  * These functions are called both from timer functions (with the timer
diff -uprN --exclude-from=/usr/local/bin/diffit.exclude 
linux-2.6.13-rc6/kernel/signal.c linux-2.6.13-rc6.work/kernel/signal.c
--- linux-2.6.13-rc6/kernel/signal.c2005-08-13 12:25:58.0 +0200
+++ linux-2.6.13-rc6.work/kernel/signal.c   2005-08-20 17:57:46.0 
+0200
@@ -390,6 +390,7 @@ void __exit_signal(struct task_struct *t
sig->nvcsw += tsk->nvcsw;
sig->nivcsw += tsk->nivcsw;
sig->sched_time += tsk->sched_time;
+   exit_notify_itimers(sig);
spin_unlock(&sighand->siglock);
sig = NULL; /* Marker for below.  */
}
@@ -1381,13 +1382,12 @@ send_sigqueue(int sig, struct sigqueue *
int ret = 0;
 
/*
-* We need the tasklist lock even for the specific
-* thread case (when we don't need to follow the group
-* lists) in order to avoid races with "p->sighand"
-* going away or changing from under us.
+* No need to hold tasklist lock here. 
+* posix_timer_event() is synchronized via 
+* exit_itimers() and exit_notify_itimers() to 
+* be protected against p->sighand == NULL
 */
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
-   read_lock(&tasklist_lock);  
spin_lock_irqsave(&p->sighand->siglock, flags);

if (unlikely(!list_empty(&q->list))) {
@@ -1414,7 +1414,6 @@ send_sigqueue(int sig, struct sigqueue *
 
 out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
-   read_unlock(&tasklist_lock);
return(ret);
 }
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread Oleg Nesterov
Thomas Gleixner wrote:
>
> send_sigqueue is called from posix_timer_fn() and acquires
> tasklist_lock, which makes no sense to me.
>
> send_sigqueue()s (l)onl(e)y user is the posix_timer function
> (posix_timer_fn(), calling posix_timer_event()).
>
> Each posix timer blocks the task from vanishing away by
> get_task_struct(), which is protected by the held tasklist_lock.
>
> The task can neither go away nor the signal handler can change until
> put_task_struct() is called inside release_posix_timer(), which removes
> any chance to do an invalid access to either task or sighand because the
> relevant timer is deleted before the call to put_task_struct(). Also
> this call is protected by tasklist_lock().

Yes, the task_struct can't go away, but if process exited this
task_struct is just chunk of garbage. I think the intent was to
protect against this case.

However, I agree with you, locking the tasklist_lock can't help,
and the code is wrong.

posix_timer_event() first checks that the thread (SIGEV_THREAD_ID
case) does not have PF_EXITING flag, then it calls send_sigqueue()
which locks task list. But if the thread exits in between the kernel
will oops.

posix_timer_event() runs under k_itimer.it_lock, but this does not
help if that thread was not the only one in thread group, in this
case we don't call exit_itimers().

The comment is wrong too. ->sighand can't change, we are clearing
posix timer on exec, and tasklist can't prevent ->sighand from
going away..

Ingo, Roland, George, am I wrong?

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread George Anzinger

Thomas Gleixner wrote:
~


2. Drift of cyclic timers (armed by set_timer()):

Due to rounding errors and the drift adjustment code, the fixed
increment which is precalculated when the timer is set up and added on
rearm, I see creeping deviation from the timeline. 


I have a patch lined up to base the rearm on human (nsac) units, so this
effect will go away. But this is waste of time until (1.) is not solved.

George ???


Could I (we) see what you have in mind?




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread George Anzinger

Thomas Gleixner wrote:

George,

On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote:


2. Drift of cyclic timers (armed by set_timer()):

Due to rounding errors and the drift adjustment code, the fixed
increment which is precalculated when the timer is set up and added on
rearm, I see creeping deviation from the timeline. 


I have a patch lined up to base the rearm on human (nsac) units, so this
effect will go away. But this is waste of time until (1.) is not solved.

George ???


Could I (we) see what you have in mind?



Nothing which applies clean at the moment and I have no access to the
box where the patch floats around.

It's simply explained.

Current code:

set_timer()
calc interval->jiffies / interval->arch_cycles;
based on it.interval

rearm()
timer->expires += interval->jiffies;
timer->arch_cycle_expires += interval->arch_cycles;
normalize(timer);

Patched code:

set_timer()
	timer.interval = it.interval; 
	timer.next_expire = it.value; 
	both stored as timespec


rearm()
next_expire += interval;
calc timer->expires/arch_cycle_expires;

So on each rearm we eliminate the rounding errors and take the drift
adjustment into account.

It adds some calculation overhead to each rearm, but 

I think the standard was written to eliminate the need for this.  The 
notion is that we have a resolution which we use in the calculations so 
while there may be drift WRT his request, there should be no drift WRT 
the requested value rounded up to the next resolution.


Still, if we can't keep that resolution in arch_cycles...

On another issue along this line, I have been thinking of changing the 
x86 TSC arch cycle size to 1ns.  (NOT the resolution, the units for the 
arch cycle.)  The reason to do this is to correctly track changes in cpu 
frequency as it is today, we would need to track down and update all 
pending HR timers when ever the frequency changed.  By using a common 
unit all we need to do is change the conversion constants (well I guess 
they would not be constants any more :).  I REALLY don't want to do this 
as it does add conversion overhead, but I can not think of another clean 
way to track TSC frequency changes.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-19 Thread Thomas Gleixner
George,

On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote:
> > 2. Drift of cyclic timers (armed by set_timer()):
> > 
> > Due to rounding errors and the drift adjustment code, the fixed
> > increment which is precalculated when the timer is set up and added on
> > rearm, I see creeping deviation from the timeline. 
> > 
> > I have a patch lined up to base the rearm on human (nsac) units, so this
> > effect will go away. But this is waste of time until (1.) is not solved.
> > 
> > George ???
> 
> Could I (we) see what you have in mind?

Nothing which applies clean at the moment and I have no access to the
box where the patch floats around.

It's simply explained.

Current code:

set_timer()
calc interval->jiffies / interval->arch_cycles;
based on it.interval

rearm()
timer->expires += interval->jiffies;
timer->arch_cycle_expires += interval->arch_cycles;
normalize(timer);

Patched code:

set_timer()
timer.interval = it.interval; 
timer.next_expire = it.value; 
both stored as timespec

rearm()
next_expire += interval;
calc timer->expires/arch_cycle_expires;

So on each rearm we eliminate the rounding errors and take the drift
adjustment into account.

It adds some calculation overhead to each rearm, but 

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/