Re: PROBLEM: high load average when idle

2007-10-03 Thread Anders Boström
> "LT" == Linus Torvalds <[EMAIL PROTECTED]> writes:

 LT> On Wed, 3 Oct 2007, Chuck Ebbert wrote:
 >> 
 >> But we reduce the number of samples because some ticks just never
 >> happen when the timers get rounded:
 >> 
 >> No rounding:
 >> 
 >> tick ... tick
 >> 1 running1 running
 >> 
 >> Rounded:
 >> 
 >> tick
 >> 2 running
 >> 
 >> In the first case the average is 1, but it's 2 in the second.

 LT> In fact, I think this is it!

 LT> The load average is not calculated every tick, because that's not just 
 LT> expensive, but we also want to have some time-based decay. So it's 
 LT> calculated every LOAD_FREQ ticks.

 LT> And guess what: LOAD_FREQ is defined to be exactly five seconds.

 LT> So imagine if the timer gets to be in sync with another event that happens 
 LT> every five seconds - let's pick at random a 5-second JBD transaction 
 LT> thing?

 LT> Anders - does this idiotic patch make a difference for you?

Yes, it does, it fixes the load average!!! I guess we have something
here!

Why does this problem only show up on my computer? Any idea?

/ Anders

 LT> Without this, I can easily imagine that the rounding code tends to try to 
 LT> round to an even second, and the load-average code generally also runs at 
 LT> even seconds!

 LT>Linus

 LT> ---
 LT>  include/linux/sched.h |2 +-
 LT>  1 files changed, 1 insertions(+), 1 deletions(-)

 LT> diff --git a/include/linux/sched.h b/include/linux/sched.h
 LT> index a01ac6d..643de0f 100644
 LT> --- a/include/linux/sched.h
 LT> +++ b/include/linux/sched.h
 LT> @@ -113,7 +113,7 @@ extern unsigned long avenrun[];/* Load 
averages */
 
 LT>  #define FSHIFT11  /* nr of bits of precision */
 LT>  #define FIXED_1   (1< -#define LOAD_FREQ (5*HZ)  /* 5 sec intervals */
 LT> +#define LOAD_FREQ (5*HZ+1)/* ~5 sec intervals */
 LT>  #define EXP_1 1884/* 1/exp(5sec/1min) as 
fixed-point */
 LT>  #define EXP_5 2014/* 1/exp(5sec/5min) */
 LT>  #define EXP_152037/* 1/exp(5sec/15min) */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-03 Thread Anders Boström
> "AM" == Andrew Morton <[EMAIL PROTECTED]> writes:

 AM> On Tue, 02 Oct 2007 23:37:31 +0200 (CEST)
 AM> Anders Bostr__m <[EMAIL PROTECTED]> wrote:

 >> My computer suffers from high load average when the system is idle,
 >> introduced by commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 .
 >> 
 >> Long story:
 >> 
 >> 2.6.20 and all later versions I've tested, including 2.6.21 and
 >> 2.6.22, make the load average high. Even when the computer is totally
 >> idle (I've tested in single user mode), the load average end up
 >> at ~0.30. The computer is still responsive, and the only fault seems
 >> to be the too high load average. All versions up to and including
 >> 2.6.19.7 is fine, and don't suffer from the problem.
 >> 
 >> I git bisect between 2.6.19 and 2.6.20 gave me
 >> 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 "[PATCH] user of the jiffies
 >> rounding code: JBD" as the first patch with the
 >> problem. 2.6.20 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 reverted
 >> works fine. 2.6.23-rc8 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6
 >> reverted also works fine.
 >> 
 >> This fixes the problem:
 >> 
 >> -- fs/jbd/transaction.c 
 >> -
 >> index cceaf57..d38e0d5 100644
 >> @@ -55,7 +55,7 @@ get_transaction(journal_t *journal, transaction_t 
 >> *transaction)
 >> spin_lock_init(&transaction->t_handle_lock);
 >> 
 >> /* Set up the commit timer for the new transaction. */
 >> -   journal->j_commit_timer.expires = round_jiffies(transaction->t_expires);
 >> +   journal->j_commit_timer.expires = transaction->t_expires;
 >> add_timer(&journal->j_commit_timer);
 >> 
 >> J_ASSERT(journal->j_running_transaction == NULL);
 >> 
 >> 
 >> I've only seen this problem on my home desktop computer. My work
 >> desktop computer and several other computers at work don't suffer from
 >> this problem. However, all other computers I've tested on is using
 >> AMD64 as architecture, and not i386 as my home desktop computer.
 >> 
 >> Please let me know how I can assist in further debugging of this, if
 >> needed.

 AM> This is unexpected.  High load average is due to either a task chewing a
 AM> lot of CPU time or a task stuck in uninterruptible sleep.

 AM> Can you please work out which of these is happening?  Run `top' on an idle
 AM> system.  Is the CPU less than 1% loaded?

Yes, top typically show 99.3% idle .

 AM> Run

 AM>ps aux | grep " D"

 AM> or something like that on an idle system, see if you can spot a task which
 AM> is spending time in D state.

 AM> If there's a task whcih is spending time in D state, try running

 AM>  echo w > /proc/sysrq-trigger ; dmesg -c > foo

 AM> the check "foo" to see if it has a task in D state (search foo for " D "). 
 AM> If it's not there, do the sysrq again, repeat until you've managed to
 AM> capture a trace of the blocked task.

 AM> If it turns out that the CPU really is spending excess amounts of time
 AM> being busy then a kernel profile would be a good way of finding out where
 AM> it is spinning.  Or run sysrq-P from the keyboard a few times.

Well, there are some kernel threads in the D state. I've seen 
md1_raid1, kjournald and pdflush in the D state. I had a very hard
time trying the catch it to the "foo" file, but take a look at this:

SysRq : Show Blocked State

 freesibling
  task PCstack   pid father child younger older
pdflush   D C157CC68 0   151  6   152   150 (L-TLB)
   dfebacd8 0046 c151afa4 c157cc68 c0112559  0ee62d80 00d5 
   000a dfeb7070 0ee62d80 00d5  dfeb717c   
   c158 c158 c158013c 0008 00ff c02730ff  dfeb7070 
Call Trace:
 [] __wake_up_common+0x39/0x60
 [] md_write_start+0x9f/0x110
 [] autoremove_wake_function+0x0/0x50
 [] make_request+0x3b/0x5e0
 [] generic_make_request+0xe1/0x150
 [] submit_bio+0x3e/0xb0
 [] bio_alloc_bioset+0x81/0x160
 [] end_buffer_async_write+0x0/0xc0
 [] submit_bh+0xb9/0x100
 [] __block_write_full_page+0x16f/0x2d0
 [] blkdev_get_block+0x0/0x60
 [] block_write_full_page+0xb0/0xe0
 [] blkdev_get_block+0x0/0x60
 [] generic_writepages+0x20e/0x350
 [] blkdev_writepage+0x0/0x10
 [] do_writepages+0x2b/0x50
 [] __writeback_single_inode+0x8a/0x370
 [] smp_apic_timer_interrupt+0x41/0x50
 [] sync_sb_inodes+0x161/0x210
 [] writeback_inodes+0x62/0x80
 [] pdflush+0x0/0x180
 [] wb_kupdate+0x74/0xe0
 [] pdflush+0xc5/0x180
 [] wb_kupdate+0x0/0xe0
 [] kthread+0xa8/0xe0
 [] kthread+0x0/0xe0
 [] kernel_thread_helper+0x7/0x1c
 ===
md2_raid1 D 0001 0   363  6   368   331 (L-TLB)
   c151ae14 0046 0001 0001 1000 0001  0001 
   000a c1505570 0ee62d80 00d5  c150567c c152fc00 d5463b20 
   d5463b60 c158 c158013c  c158 c0270c7f  c1505570 
Call Trace:
 [] md_super_wait+0x9f/0xc0
 [] autoremove_wake_function+0x0

Re: PROBLEM: high load average when idle

2007-10-03 Thread Arjan van de Ven

Linus Torvalds wrote:


On Wed, 3 Oct 2007, Arjan van de Ven wrote:

not sure this is going to help; I mean, the load gets only updated in actual
timer interrupts... and on a tickless system there's very few of those
around. and usually at places round_jiffies() already put a timer on.


Yeah, you're right. Although in practice, at least on a system running 
X, I'd expect that there still is lots of other timers going on, hiding 
the issue.


eh not really; on a normal distro desktop you maybe have 10 
wakeups/sec or so; on a tuned one you have 2 or less.




Hmm. Maybe Anders' problem stems partly from the fact that he really is 
using the tweaks to make that tickless theory more true than it tends to 
be on most systems?


we fixed a TON of stuff over the last months.. standard desktops (F8 / 
next Ubuntu) will be around 10 wakeups/sec, in a lab environment you 
can get below 2 ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-03 Thread Linus Torvalds


On Wed, 3 Oct 2007, Arjan van de Ven wrote:
> 
> not sure this is going to help; I mean, the load gets only updated in actual
> timer interrupts... and on a tickless system there's very few of those
> around. and usually at places round_jiffies() already put a timer on.

Yeah, you're right. Although in practice, at least on a system running 
X, I'd expect that there still is lots of other timers going on, hiding 
the issue.

Hmm. Maybe Anders' problem stems partly from the fact that he really is 
using the tweaks to make that tickless theory more true than it tends to 
be on most systems?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-03 Thread Arjan van de Ven

Linus Torvalds wrote:
Without this, I can easily imagine that the rounding code tends to try to 
round to an even second, and the load-average code generally also runs at 
even seconds!


Linus

---
 include/linux/sched.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a01ac6d..643de0f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -113,7 +113,7 @@ extern unsigned long avenrun[]; /* Load 
averages */
 
 #define FSHIFT		11		/* nr of bits of precision */

 #define FIXED_1(1<

not sure this is going to help; I mean, the load gets only updated in 
actual timer interrupts... and on a tickless system there's very few 
of those around. and usually at places round_jiffies() already put 
a timer on.


(also.. one thing that might make Chuck's theory wrong is that the 
sampling code doesn't sample timer activity since that's run just 
after the sampler in the same irq)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-03 Thread Linus Torvalds


On Wed, 3 Oct 2007, Chuck Ebbert wrote:
> 
> But we reduce the number of samples because some ticks just never
> happen when the timers get rounded:
> 
> No rounding:
> 
>   tick ... tick
>  1 running1 running
> 
> Rounded:
> 
>   tick
>  2 running
> 
> In the first case the average is 1, but it's 2 in the second.

In fact, I think this is it!

The load average is not calculated every tick, because that's not just 
expensive, but we also want to have some time-based decay. So it's 
calculated every LOAD_FREQ ticks.

And guess what: LOAD_FREQ is defined to be exactly five seconds.

So imagine if the timer gets to be in sync with another event that happens 
every five seconds - let's pick at random a 5-second JBD transaction 
thing?

Anders - does this idiotic patch make a difference for you?

Without this, I can easily imagine that the rounding code tends to try to 
round to an even second, and the load-average code generally also runs at 
even seconds!

Linus

---
 include/linux/sched.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a01ac6d..643de0f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -113,7 +113,7 @@ extern unsigned long avenrun[]; /* Load 
averages */
 
 #define FSHIFT 11  /* nr of bits of precision */
 #define FIXED_1(1

Re: PROBLEM: high load average when idle

2007-10-03 Thread Chuck Ebbert
On 10/02/2007 07:26 PM, Arjan van de Ven wrote:
> On Tue, 02 Oct 2007 18:33:58 -0400
>> Or, everybody wakes up at once right when we are taking a sample.  :)
> 
> nice try but we sample every timer tick; this code being timer driven
> makes it what you say it is regardless of *which* timer tick it
> happens at ;)
> 

But we reduce the number of samples because some ticks just never
happen when the timers get rounded:

No rounding:

  tick ... tick
 1 running1 running

Rounded:

  tick
 2 running

In the first case the average is 1, but it's 2 in the second.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-03 Thread Thorsten Kranzkowski
On Tue, Oct 02, 2007 at 11:37:31PM +0200, Anders Bostr?m wrote:
> Hi!
> 
> My computer suffers from high load average when the system is idle,
> introduced by commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 .

Another datapoint: I observe a similar effect on both of my alphas:

top - 09:30:43 up 13 min, 18 users,  load average: 0.65, 0.64, 0.44
Tasks:  76 total,   1 running,  75 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1% us,  0.5% sy,  0.0% ni, 99.1% id,  0.2% wa,  0.1% hi,  0.0% si
Mem:   2067792k total,55792k used,  2012000k free, 4160k buffers
Swap:  1048560k total,0k used,  1048560k free,18752k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  637 root  15   0  2904 1552 1192 R1  0.1   0:01.35 top
  556 root  15   0  2008  528  432 S0  0.0   0:00.01 gpm
1 root  15   0  1960  800  680 S0  0.0   0:01.43 init
2 root  10  -5 000 S0  0.0   0:00.00 kthreadd
3 root  RT  -5 000 S0  0.0   0:00.00 migration/0
4 root  34  19 000 S0  0.0   0:00.00 ksoftirqd/0
5 root  RT  -5 000 S0  0.0   0:00.00 watchdog/0
6 root  RT  -5 000 S0  0.0   0:00.00 migration/1
7 root  34  19 000 S0  0.0   0:00.00 ksoftirqd/1

This is the dual-ev6 one, currently 2.6.22-rc5. 

I didn't bother to do any investigation, yet ;-)


> This fixes the problem:

I'll check this evening.

Bye,
Thorsten


-- 
| Thorsten KranzkowskiInternet: [EMAIL PROTECTED]  |
| Mobile: ++49 170 1876134   Snail: Kiebitzstr. 14, 49324 Melle, Germany  |
| Ampr: [EMAIL PROTECTED], [EMAIL PROTECTED] [44.130.8.19] |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-03 Thread Anders Boström
> "AvdV" == Arjan van de Ven <[EMAIL PROTECTED]> writes:

 AvdV> Anders Boström wrote:
 >> Hi!
 >> 
 >> My computer suffers from high load average when the system is idle,
 >> introduced by commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 .
 >> 
 >> Long story:
 >> 
 >> 2.6.20 and all later versions I've tested, including 2.6.21 and
 >> 2.6.22, make the load average high. Even when the computer is totally
 >> idle (I've tested in single user mode), the load average end up
 >> at ~0.30. The computer is still responsive, and the only fault seems
 >> to be the too high load average. All versions up to and including
 >> 2.6.19.7 is fine, and don't suffer from the problem.

 AvdV> can you tell me if you're tuning ext3 in any way to have a much 
 AvdV> shorter timeout than the standard 5 seconds?

No, I'm using the standard 5 seconds timeout.

/ Anders
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-02 Thread Mark Lord

Arjan van de Ven wrote:

On Tue, 02 Oct 2007 18:46:18 -0400

On a related note, {set/get}itimer() currently are buggy (since
2.6.11 or so), also due to this round_jiffies() thing I believe.


I very much believe that it is totally unrelated... most of all since
round_jiffies() wasn't in the kernel then an also isn't used anywhere
near these timers.


Ah, yes, you're correct.  The itimer routines do their *own* rounding.

-ml
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-02 Thread Arjan van de Ven
On Tue, 02 Oct 2007 18:33:58 -0400
> Or, everybody wakes up at once right when we are taking a sample.  :)

nice try but we sample every timer tick; this code being timer driven
makes it what you say it is regardless of *which* timer tick it
happens at ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-02 Thread Arjan van de Ven
On Tue, 02 Oct 2007 18:46:18 -0400
> 
> On a related note, {set/get}itimer() currently are buggy (since
> 2.6.11 or so), also due to this round_jiffies() thing I believe.

I very much believe that it is totally unrelated... most of all since
round_jiffies() wasn't in the kernel then an also isn't used anywhere
near these timers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-02 Thread Arjan van de Ven
On Tue, 2 Oct 2007 15:32:53 -0700 (PDT)

> And I wonder if the same kind thing is effectively happening here:
> the code is written so that it *tries* to sleep, but the rounding of
> the clock basically means that it's trying to sleep using a different
> clock than the one we're using to wake things up with, so some
> percentage of the time it doesn't sleep at all!

we're talking about a timer that (normally) is 5 seconds.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-02 Thread Arjan van de Ven

Anders Boström wrote:

Hi!

My computer suffers from high load average when the system is idle,
introduced by commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 .

Long story:

2.6.20 and all later versions I've tested, including 2.6.21 and
2.6.22, make the load average high. Even when the computer is totally
idle (I've tested in single user mode), the load average end up
at ~0.30. The computer is still responsive, and the only fault seems
to be the too high load average. All versions up to and including
2.6.19.7 is fine, and don't suffer from the problem.




can you tell me if you're tuning ext3 in any way to have a much 
shorter timeout than the standard 5 seconds?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-02 Thread Mark Lord

Linus Torvalds wrote:


On Tue, 2 Oct 2007, Andrew Morton wrote:

This is unexpected.  High load average is due to either a task chewing a
lot of CPU time or a task stuck in uninterruptible sleep.


Not necessarily.

We saw high loadaverages with the timer bogosity with "gettimeofday()" and 
"select()" not agreeing, so they would do things like


date = time(..)
select(.. , timeout =  )

and when "date" wasn't taking the jiffies offset into account, and thus 
mixing these kinds of different time sources, the select ended up 
returning immediately because they effectively used different clocks, and 
suddenly we had some applications chewing up 30% CPU time, because they 
were in a loop that *tried* to sleep.


And I wonder if the same kind thing is effectively happening here: the 
code is written so that it *tries* to sleep, but the rounding of the clock 
basically means that it's trying to sleep using a different clock than the 
one we're using to wake things up with, so some percentage of the time it 
doesn't sleep at all!


I wonder if the whole "round_jiffies()" thing should be written so that it 
never rounds down, or at least never rounds down to before the current 
second!

...

On a related note, {set/get}itimer() currently are buggy (since 2.6.11 or so),
also due to this round_jiffies() thing I believe.

If one sets ITIMER_PROF to, say, 5.00 seconds, and then reads it back
very shortly thereafter, it will give 5.20 seconds as the value (HZ==1000).

AFAIK, this should *never* be possible --> any read of get_itimer should never
return a value higher than the starting value.  This makes ITIMER_PROF not very
useful for measuring one's own CPU usage, for example.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-02 Thread Arjan van de Ven

Linus Torvalds wrote:


I wonder if the whole "round_jiffies()" thing should be written so that it 
never rounds down, or at least never rounds down to before the current 
second!


that's what it is supposed to do already...

166
167 if (j <= jiffies) /* rounding ate our timeout entirely; */
168 return original;
169 return j;
170 }

so there is always a gap of at least 1 jiffie no matter what




I have to say, I also think it's a bit iffy to do "round_jiffies()" at all 
in that per-CPU kind of way. The "per-cpu" thing is quite possibly going 
to change by the time we actually add the timer, so the goal of trying to 
get wakeups to happen in "bunches" per CPU should really be done by 
setting a flag on the timer itself - so that we could do that rounding 
when the timer is actually added to the per-cpu queues!


it's pretty much the same thing though
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-02 Thread Chuck Ebbert
On 10/02/2007 06:07 PM, Andrew Morton wrote:
> On Tue, 02 Oct 2007 23:37:31 +0200 (CEST)
> Anders Bostr__m <[EMAIL PROTECTED]> wrote:
> 
>> My computer suffers from high load average when the system is idle,
>> introduced by commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 .
>>
>> Long story:
>>
>> 2.6.20 and all later versions I've tested, including 2.6.21 and
>> 2.6.22, make the load average high. Even when the computer is totally
>> idle (I've tested in single user mode), the load average end up
>> at ~0.30. The computer is still responsive, and the only fault seems
>> to be the too high load average. All versions up to and including
>> 2.6.19.7 is fine, and don't suffer from the problem.
>>
>> I git bisect between 2.6.19 and 2.6.20 gave me
>> 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 "[PATCH] user of the jiffies
>> rounding code: JBD" as the first patch with the
>> problem. 2.6.20 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 reverted
>> works fine. 2.6.23-rc8 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6
>> reverted also works fine.
>>
>> This fixes the problem:
>>
>> -- fs/jbd/transaction.c -
>> index cceaf57..d38e0d5 100644
>> @@ -55,7 +55,7 @@ get_transaction(journal_t *journal, transaction_t 
>> *transaction)
>>  spin_lock_init(&transaction->t_handle_lock);
>>  
>>  /* Set up the commit timer for the new transaction. */
>> -journal->j_commit_timer.expires = round_jiffies(transaction->t_expires);
>> +journal->j_commit_timer.expires = transaction->t_expires;
>>  add_timer(&journal->j_commit_timer);
>>  
>>  J_ASSERT(journal->j_running_transaction == NULL);
>>
>>
>> I've only seen this problem on my home desktop computer. My work
>> desktop computer and several other computers at work don't suffer from
>> this problem. However, all other computers I've tested on is using
>> AMD64 as architecture, and not i386 as my home desktop computer.
>>
>> Please let me know how I can assist in further debugging of this, if
>> needed.
> 
> This is unexpected.  High load average is due to either a task chewing a
> lot of CPU time or a task stuck in uninterruptible sleep.
> 

Or, everybody wakes up at once right when we are taking a sample.  :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-02 Thread Linus Torvalds


On Tue, 2 Oct 2007, Andrew Morton wrote:
> 
> This is unexpected.  High load average is due to either a task chewing a
> lot of CPU time or a task stuck in uninterruptible sleep.

Not necessarily.

We saw high loadaverages with the timer bogosity with "gettimeofday()" and 
"select()" not agreeing, so they would do things like

date = time(..)
select(.. , timeout =  )

and when "date" wasn't taking the jiffies offset into account, and thus 
mixing these kinds of different time sources, the select ended up 
returning immediately because they effectively used different clocks, and 
suddenly we had some applications chewing up 30% CPU time, because they 
were in a loop that *tried* to sleep.

And I wonder if the same kind thing is effectively happening here: the 
code is written so that it *tries* to sleep, but the rounding of the clock 
basically means that it's trying to sleep using a different clock than the 
one we're using to wake things up with, so some percentage of the time it 
doesn't sleep at all!

I wonder if the whole "round_jiffies()" thing should be written so that it 
never rounds down, or at least never rounds down to before the current 
second!

I have to say, I also think it's a bit iffy to do "round_jiffies()" at all 
in that per-CPU kind of way. The "per-cpu" thing is quite possibly going 
to change by the time we actually add the timer, so the goal of trying to 
get wakeups to happen in "bunches" per CPU should really be done by 
setting a flag on the timer itself - so that we could do that rounding 
when the timer is actually added to the per-cpu queues!

Now, I think the JBD "t_expires" field should never be "near" in seconds, 
so I do find it a bit surprising that this rounding can have any effect, 
but on the other hand it clearly *does* have some effect, so.. It migt 
just be interacting with some other use, of course.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: high load average when idle

2007-10-02 Thread Andrew Morton
On Tue, 02 Oct 2007 23:37:31 +0200 (CEST)
Anders Bostr__m <[EMAIL PROTECTED]> wrote:

> My computer suffers from high load average when the system is idle,
> introduced by commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 .
> 
> Long story:
> 
> 2.6.20 and all later versions I've tested, including 2.6.21 and
> 2.6.22, make the load average high. Even when the computer is totally
> idle (I've tested in single user mode), the load average end up
> at ~0.30. The computer is still responsive, and the only fault seems
> to be the too high load average. All versions up to and including
> 2.6.19.7 is fine, and don't suffer from the problem.
> 
> I git bisect between 2.6.19 and 2.6.20 gave me
> 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 "[PATCH] user of the jiffies
> rounding code: JBD" as the first patch with the
> problem. 2.6.20 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 reverted
> works fine. 2.6.23-rc8 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6
> reverted also works fine.
> 
> This fixes the problem:
> 
> -- fs/jbd/transaction.c -
> index cceaf57..d38e0d5 100644
> @@ -55,7 +55,7 @@ get_transaction(journal_t *journal, transaction_t 
> *transaction)
>   spin_lock_init(&transaction->t_handle_lock);
>  
>   /* Set up the commit timer for the new transaction. */
> - journal->j_commit_timer.expires = round_jiffies(transaction->t_expires);
> + journal->j_commit_timer.expires = transaction->t_expires;
>   add_timer(&journal->j_commit_timer);
>  
>   J_ASSERT(journal->j_running_transaction == NULL);
> 
> 
> I've only seen this problem on my home desktop computer. My work
> desktop computer and several other computers at work don't suffer from
> this problem. However, all other computers I've tested on is using
> AMD64 as architecture, and not i386 as my home desktop computer.
> 
> Please let me know how I can assist in further debugging of this, if
> needed.

This is unexpected.  High load average is due to either a task chewing a
lot of CPU time or a task stuck in uninterruptible sleep.

Can you please work out which of these is happening?  Run `top' on an idle
system.  Is the CPU less than 1% loaded?

Run

ps aux | grep " D"

or something like that on an idle system, see if you can spot a task which
is spending time in D state.

If there's a task whcih is spending time in D state, try running

 echo w > /proc/sysrq-trigger ; dmesg -c > foo

the check "foo" to see if it has a task in D state (search foo for " D "). 
If it's not there, do the sysrq again, repeat until you've managed to
capture a trace of the blocked task.

If it turns out that the CPU really is spending excess amounts of time
being busy then a kernel profile would be a good way of finding out where
it is spinning.  Or run sysrq-P from the keyboard a few times.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: high load average when idle

2007-10-02 Thread Anders Boström
Hi!

My computer suffers from high load average when the system is idle,
introduced by commit 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 .

Long story:

2.6.20 and all later versions I've tested, including 2.6.21 and
2.6.22, make the load average high. Even when the computer is totally
idle (I've tested in single user mode), the load average end up
at ~0.30. The computer is still responsive, and the only fault seems
to be the too high load average. All versions up to and including
2.6.19.7 is fine, and don't suffer from the problem.

I git bisect between 2.6.19 and 2.6.20 gave me
44d306e1508fef6fa7a6eb15a1aba86ef68389a6 "[PATCH] user of the jiffies
rounding code: JBD" as the first patch with the
problem. 2.6.20 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6 reverted
works fine. 2.6.23-rc8 with 44d306e1508fef6fa7a6eb15a1aba86ef68389a6
reverted also works fine.

This fixes the problem:

-- fs/jbd/transaction.c -
index cceaf57..d38e0d5 100644
@@ -55,7 +55,7 @@ get_transaction(journal_t *journal, transaction_t 
*transaction)
spin_lock_init(&transaction->t_handle_lock);
 
/* Set up the commit timer for the new transaction. */
-   journal->j_commit_timer.expires = round_jiffies(transaction->t_expires);
+   journal->j_commit_timer.expires = transaction->t_expires;
add_timer(&journal->j_commit_timer);
 
J_ASSERT(journal->j_running_transaction == NULL);


I've only seen this problem on my home desktop computer. My work
desktop computer and several other computers at work don't suffer from
this problem. However, all other computers I've tested on is using
AMD64 as architecture, and not i386 as my home desktop computer.

Please let me know how I can assist in further debugging of this, if
needed.

System info:

A Debian stable system with ABIT KV7 MB, VIA KT600 chipset,
Athlon XP 1500+ CPU, GeForce DDR and Atheros AR5212 wlan
board. Details below.

I've tested without nvidia and the madwifi modules listed below, with
the same results.

eckert:/usr/src/linux-2.6>sh scripts/ver_linux 
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux eckert.bostrom.dyndns.org 2.6.20noload #1 Mon Oct 1 21:36:19 CEST 2007 
i686 GNU/Linux
 
Gnu C  4.1.2
Gnu make   3.81
binutils   2.17
util-linux 2.12r
mount  2.12r
module-init-tools  3.3-pre2
e2fsprogs  1.40-WIP
Linux C Library2.3.6
Dynamic linker (ldd)   2.3.6
Procps 3.2.7
Net-tools  1.60
Console-tools  0.2.3
Sh-utils   5.97
udev   105
wireless-tools 28
Modules Loaded nls_iso8859_1 nls_cp437 nvidia wlan_tkip iptable_filter 
ip_tables x_tables softdog snd_via82xx snd_ac97_codec ac97_bus snd_mpu401_uart 
snd_seq_midi snd_rawmidi wlan_scan_sta ath_rate_sample ath_pci wlan ath_hal

eckert:~> cat /proc/cpuinfo 
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 6
model   : 6
model name  : AMD Athlon(tm) XP 1500+
stepping: 2
cpu MHz : 1383.971
cache size  : 256 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow ts
bogomips: 2769.67
clflush size: 32

eckert:~> cat /proc/ioports
-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : :00:0f.1
  0170-0177 : libata
01f0-01f7 : :00:0f.1
  01f0-01f7 : libata
0295-0296 : w83627hf
0376-0376 : :00:0f.1
  0376-0376 : libata
03c0-03df : vesafb
03f6-03f6 : :00:0f.1
  03f6-03f6 : libata
0cf8-0cff : PCI conf1
4000-407f : motherboard
  4000-4003 : ACPI PM1a_EVT_BLK
  4004-4005 : ACPI PM1a_CNT_BLK
  4008-400b : ACPI PM_TMR
  4010-4015 : ACPI CPU throttle
  4020-4023 : ACPI GPE0_BLK
5000-500f : motherboard
  5000-5007 : vt596_smbus
c000-c007 : :00:0f.0
  c000-c007 : sata_via
c400-c403 : :00:0f.0
  c400-c403 : sata_via
c800-c807 : :00:0f.0
  c800-c807 : sata_via
cc00-cc03 : :00:0f.0
  cc00-cc03 : sata_via
d000-d00f : :00:0f.0
  d000-d00f : sata_via
d400-d4ff : :00:0f.0
  d400-d4ff : sata_via
d800-d80f : :00:0f.1
  d800-d80f : libata
dc00-dc1f : :00:10.0
  dc00-dc1f : uhci_hcd
e000-e01f : :00:10.1
  e000-e01f : uhci_hcd
e400-e41f : :00:10.2
  e400-e41f : uhci_hcd
e800-e81f : :00:10.3
  e800-e81f : uhci_hcd
ec00-ecff : :00:11.5
  ec00-ecff : VIA8237
eckert:~> cat /proc/iomem 
-0009f3ff : System RAM
0009f400-0009 : reserved
000a-000b : Video RAM area
000c-000cbbff : Video RO