I've decided to gather some statistics about who's affected by the "SMP
scheduler" problem. I'd characterise this as being mainly an
interactive response problem. The included text below is a recipe for
exhibiting the problem.
Basically, I'll gather replies and look for correlations with CPU,
kernel version etc. I'm interested in people who have 2.1 *and* 2.0
kernels, and mainly SMP, but by all means perform the instructions on UP
machines for a cross-check.
I've also included a sample CPU hog (not an FPU hog as stated in the
original private email below) which tickles the problem on my SMP box.
The recipe has been made as simple as possible, and the test should take
about 3 minutes to perform :-)).
Perhaps best to respond via private email, and I'll summarise.
Things to mention:
*CPU type/speed/quantity. *Memory type/quantity (irrelevant probably).
*Kernel version/patchlevel. *Whether ordinary keypresses suffer
delays. *Whether ENTER keypresses suffer delays (NB these two seem to
have subtly different causes which is why I'm asking for separate
reporting).
thanks
Neil
PS: Also, since Rik has a patch which may fix this problem you can try
it too and give some feedback. I'm planning to try it as soon as I
finish typing this ;-)) However, I still don't see an SMP specific bit
jumping out at me so I have lingering doubts... Andrea Arcangeli's
patch will also be tried.
PPS: I'll add that an "ENTER keypress delay" has been observed on a SMP
2.1.130 kernel in a UP box, but no sign at all of an ordinary keypress
delay. On my 2.0 UP boxes, *neither* delay can be triggered. I don't
have a 2.0 SMP box.
-------------------------
Recipe now:
> Well... if you have an SMP box, then to replicate my symptoms all you
> should need to do is:
>
> * fresh boot (optional, just for a clean start)
>
> * start one CPU-hog per CPU (FPU hogs in my case which could be relevant
^^^ (not FPU hogs any more)
> I guess)
>
> * bring up a shell on a VT or xterm or network session, it doesn't
> matter
>
> * hold down any key (except ENTER) and watch the cursor repeatedly stall
> and jerk its way across the screen.
>
> * now try hitting ENTER about 10 times in quick succession (in a TCSH
> for maximum effect, perhaps this is due to ioctl()'s setting
> current->counter=0 in some cases). You should see HUGE delays of
> several tenths of a second between each printout of the command prompt.
>
> Then try the above on a non-SMP box (SMP kernel by all means, just not
> an SMP box, in fact it might work if you give numcpus=1 or whatever on
> an MP box) and you should see perfectly unaffected shell typing
> responses.
>
> Summary: one hog per CPU gives AWFUL response time on an SMP box, but
> gives no trouble at all on a non-SMP box.
>
> cheers
> Neil
-------------------------
And now the most basic hog that came to mind:
void main()
{
for (;;);
}
---------------------------
Andrea Arcangeli wrote:
>
> On Tue, 15 Dec 1998, Neil Conway wrote:
>
> >I'll certainly have a look at it and give it a go ASAP. Can you explain
> >to me easily (since I haven't read it yet) why the behaviour should be
> >different under SMP? I haven't yet compiled a kernel with ~=0 processor
>
> Is the behaviour different is SMP? I have not read your email closely due
> majordomo problems...
>
> >change penalty but I no longer think this is the culprit. My current
> >off-the-wall theory is different code paths in SMP... But I really have
> >no clue.
>
> Give me more info about your problem and be sure (how to reproduce it
> would be helpful too).
(Rik, I'm CCing this to you since you now have an SMP box and can
attempt to reproduce this).
Well... if you have an SMP box, then to replicate my symptoms all you
should need to do is:
* fresh boot (optional, just for a clean start)
* start one CPU-hog per CPU (FPU hogs in my case which could be relevant
I guess)
* bring up a shell on a VT or xterm or network session, it doesn't
matter
* hold down any key (except ENTER) and watch the cursor repeatedly stall
and jerk its way across the screen.
* now try hitting ENTER about 10 times in quick succession (in a TCSH
for maximum effect, perhaps this is due to ioctl()'s setting
current->counter=0 in some cases). You should see HUGE delays of
several tenths of a second between each printout of the command prompt.
Then try the above on a non-SMP box (SMP kernel by all means, just not
an SMP box, in fact it might work if you give numcpus=1 or whatever on
an MP box) and you should see perfectly unaffected shell typing
responses.
Summary: one hog per CPU gives AWFUL response time on an SMP box, but
gives no trouble at all on a non-SMP box.
cheers
Neil