=>From: Joe <[EMAIL PROTECTED]>
=>...
=>I have moved this over to the kernel mailing list as well
=>because if this interrupt bug is as you say it is then it may
=>affect non smp machines in a different way then SMP machines and
=>may cause problems elsewhere...
=>>
=>> I had (still have, but very occasionally) a similar problem
=>> with 2.3.3
=>> and some patches on an Intel N440BX (2xPIII,256MB) system.
=>> I've tried
=>> 2.2.x kernels as well, and without my patch, I have *lots* of
=>> mouse
=>> problems.
=>>
=>
=> again my system was doing fine for about a month then last
=>night it came back.. I had used a patch that allowed me to
=>change the APIC triggering from APIC-edge to APIC-level, and
=>things were okay for a month or so..
=>
=>> I tracked it down to problems with the kernel's service of
=>> gettimeofday() calls: the routines in
=>> linux/arch/i386/kernel/time.c do
=>
=>okay was this code between #ifdef __SMP__ or not? and how did
=>you track this down? if this was not between #ifdef's then
=>people on non SMP machines should have clock skews and interrupt
=>misses also..
The area I frobbed was not #ifdef'd __SMP__; there's only one section
in my current time.c that is, but that happens to be in the
do_timer_interrupt function (!).
I tracked it down after I noticed my asclock blinking forward (to a
later time) and then back; I figured out how that could cause the
screensaver problems I was seeing, and my X server vendor (XiG)
confirmed that non-monotonic time could cause erratic mouse
behaviour. Then I wrote that gettimeofday() program and saw the
problem in all its gory, and finally I hacked my kernel to prevent
gettimeofday from returning an earlier time. Of course, I didn't
prevent time from increasing, and I still get occasional crashes when
the clock starts racing forward, but I don't know what triggers it.
I have a debugging printk in there now, and I've found that the TSC
calculations seem to be OK. It's the value of delay_at_last_interrupt
that seems to be negative/very large sometimes, and causes the clock
racing. I've tried to trace it out, but I get lost with the 8254
timer stuff.
I've never seen the problem on a single proc box (not even on this
machine before I installed the second CPU). I really hesitate to
guess what might cause it, but interrupts would be something I'd look
at carefully. I found I had more problems when I was doing heavy
serial I/O or burning a CD-ROM.
I'd be willing to look at it again if somebody's interested. I can
reproduce the original problem at will with a stock 2.2.10 kernel. I
haven't upgraded from 2.3.3 because (last I looked) somebody had
broken the VFAT and/or NT filesystems, but I could try a more recent
kernel too.
d.
-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]