=>From: Joe <[EMAIL PROTECTED]>
 =>...
 =>I have moved this over to the kernel mailing list as well
 =>because  if this interrupt bug is as you say it is then it may
 =>affect non smp machines in a different way then SMP machines and
 =>may cause problems elsewhere...
 =>> 
 =>> I had (still have, but very occasionally) a similar problem
 =>> with 2.3.3
 =>> and some patches on an Intel N440BX (2xPIII,256MB) system. 
 =>> I've tried
 =>> 2.2.x kernels as well, and without my patch, I have *lots* of
 =>> mouse
 =>> problems.
 =>> 
 =>
 =>  again my system was doing fine for about a month then last
 =>night it came back.. I had used a patch that allowed me to
 =>change the APIC triggering from APIC-edge to APIC-level, and
 =>things were okay for a month or so.. 
 =>
 =>> I tracked it down to problems with the kernel's service of
 =>> gettimeofday() calls: the routines in
 =>> linux/arch/i386/kernel/time.c do
 =>
 =>okay was this code between #ifdef __SMP__  or not? and how did
 =>you track this down? if this was not between #ifdef's then
 =>people on non SMP machines should have clock skews and interrupt
 =>misses also.. 

The area I frobbed was not #ifdef'd __SMP__; there's only one section
in my current time.c that is, but that happens to be in the
do_timer_interrupt function (!).

I tracked it down after I noticed my asclock blinking forward (to a
later time) and then back; I figured out how that could cause the
screensaver problems I was seeing, and my X server vendor (XiG)
confirmed that non-monotonic time could cause erratic mouse
behaviour.  Then I wrote that gettimeofday() program and saw the
problem in all its gory, and finally I hacked my kernel to prevent
gettimeofday from returning an earlier time.  Of course, I didn't
prevent time from increasing, and I still get occasional crashes when
the clock starts racing forward, but I don't know what triggers it.

I have a debugging printk in there now, and I've found that the TSC
calculations seem to be OK.  It's the value of delay_at_last_interrupt
that seems to be negative/very large sometimes, and causes the clock
racing.  I've tried to trace it out, but I get lost with the 8254
timer stuff.

I've never seen the problem on a single proc box (not even on this
machine before I installed the second CPU).  I really hesitate to
guess what might cause it, but interrupts would be something I'd look
at carefully.  I found I had more problems when I was doing heavy
serial I/O or burning a CD-ROM.

I'd be willing to look at it again if somebody's interested.  I can
reproduce the original problem at will with a stock 2.2.10 kernel.  I
haven't upgraded from 2.3.3 because (last I looked) somebody had
broken the VFAT and/or NT filesystems, but I could try a more recent
kernel too.

d.
-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to