please see the bottom of this message for the origanal message..

I have moved this over to the kernel mailing list as well
because  if this interrupt bug is as you say it is then it may
affect non smp machines in a different way then SMP machines and
may cause problems elsewhere...
> 
> I had (still have, but very occasionally) a similar problem
> with 2.3.3
> and some patches on an Intel N440BX (2xPIII,256MB) system. 
> I've tried
> 2.2.x kernels as well, and without my patch, I have *lots* of
> mouse
> problems.
> 

  again my system was doing fine for about a month then last
night it came back.. I had used a patch that allowed me to
change the APIC triggering from APIC-edge to APIC-level, and
things were okay for a month or so.. 

> I tracked it down to problems with the kernel's service of
> gettimeofday() calls: the routines in
> linux/arch/i386/kernel/time.c do

okay was this code between #ifdef __SMP__  or not? and how did
you track this down? if this was not between #ifdef's then
people on non SMP machines should have clock skews and interrupt
misses also.. 

> calculations to detect missed interrupts and (if your
> processor has a
> TSC) to improve time resolution.  What would happen is that
> the missed
> interrupts calculation and/or the TSC-elapsed-time calculation
> would
> return a "negative" number which, since it was treated as
> unsigned,
> would end up advancing the clock a lot (usually about
> 4294967000uS, or

hmm that may explain why my clock is off, however it takes a
while before it skews an hour...

> a little more than an hour).  Then, on the next call to
> gettimeofday(), the clock would bounce back to the correct
> time.
> However, the damage was done: on the forward bounce, the X
> screensaver
> would kick in, and on the backwards bounce, the mouse pointer
> calculations would get screwed up, resulting in erratic
> pointer
> behaviour, random bogus clicks, and so on.
> 
> I made a patch that prevents gettimeofday() from returning
> times
> earlier than it returned previously, and also flags when the
> missed
> interrupts or TSC calculations return a large forward
> adjustment; this
> has solved most of my problems, but I still occasionally get
> wild time
> warps where the clock starts racing forward.  I haven't been
> able to
> track this down, because once it starts, cron jobs and the
> like get
> started up (logrotate, find, makewhatis, etc) and the system
> dies a
> quick, messy death.
> 
> Andrea Archangeli also made some patches that were designed to
> eliminate missed interrupts (and detect them, too), but I
> still get
> log messages about "lost ticks."
> 
> Anyway, you can detect time warps pretty easily with a program
> that
> just calls gettimeofday() in a tight loop.  I'll append the
> one I used
> when I was actively working on this problem.  I've pretty much
> given
> up, though; Andrea's & my patches makes my machine mostly
> usable, and
> I figure eventually one of the non-devel kernels will solve
> all the
> problems the Right Way.
> 
> d.
> 
>                                                         * * * * *
> 
> #include <stdio.h>
> #include <sys/types.h>
> #include <time.h>
> #include <sys/time.h>
> #include <unistd.h>
> 
> inline long
> timeval_diff( const struct timeval *a, const struct timeval *b
> )
> {
>       if (a->tv_sec < b->tv_sec ||
>               (a->tv_sec == b->tv_sec && a->tv_usec < b->tv_usec))
>               return -((b->tv_sec - a->tv_sec) * 1000000 +
> (b->tv_usec-a->tv_usec));
>       else
>               return (a->tv_sec - b->tv_sec) * 1000000 + (a->tv_usec -
> b->tv_usec);
> }
> 
> int
> main( int argc, char **argv )
> {
>       struct timeval   now;
>       struct timeval   then;
>       unsigned long    n_fwd = 0;
>       long                     max_fwd = 0;
>       unsigned long    n_back = 0;
>       long                     max_back = 0;
>       time_t                   last_log = time( 0 );
> 
>       gettimeofday( &then, 0 );
>       
>       for ( ; ; ) {
>               long     delta;
>               
>               gettimeofday( &now, 0 );
> 
>               delta = timeval_diff( &now, &then );
> 
>               if (delta >= 0)
>                       ++n_fwd;
>               else
>                       ++n_back;
> 
>               if (delta > max_fwd)
>                       max_fwd = delta;
>               else
>               if (delta < max_back)
>                       max_back = delta;
> 
>               if (now.tv_sec >= last_log + 1) {
>                       printf( "%.19s FWD:%6ld/%6ld:BACK (%4ldus > delta >
> %4ldus)\n",
>                                       ctime( &now.tv_sec ),
>                                  n_fwd, n_back, max_back, max_fwd );
>                       n_fwd = n_back = 0;
>                       max_fwd = max_back = 0;
>                       last_log = now.tv_sec;
>               }
> 
>               then = now;
>       }
> 
>       return 0;
> }
> 
> 
*******************original message *************************
it came back last night, while using X my mouse suddenly went
                  bezerko again.

                  <breif history>
                  SMP machine DUAL 233 tyan tomcat IV (buggy
BIOS)
                  2.2.9 or later kernels
                  there have been problems with the APIC's on
numerous machines

                  a patch was sent out to change the behavior of
the interrupt
                  from APIC-level to APIC-edge or XT-PIC or from
anyone to another
                  (thanks Jos)... I used this patch to change
the behavior from
                  edge to level, and for a while this was
working, but last night
                  it came back. 

                  I was writing some code in nedit, and had
gnome-terminal open,
                  and runing fvwm95 as the window manager, with
kfm and xteddy..
                  </end history>

                  I'll be switiching the APIC-level to XT-PIC as
I know that will
                  work. It did with 2.0 kernels and I tried it
once before and
                  seemed to work, but if it doesn't I'll let you
all know. Also
                  level seemed to be working just fine for a
while now, probably
                  about a month

                  <Good news>
                  I was using a program I wrote (mpstat self
plug here) to log
                  some information, (cpu %, interrupts, maj/min
faults) and have
                  that log file saved with the date in the
filename <bad news>but
                  not on this machine. Although this program
does log interrupts,
                  they are not seperate, they are summed up, so
if there is a
                  sudden increase it would be difficult to tell
where it was but
                  it is possible to guess.</bad news> Also in
viewing this there
                  was no increase in interrupts that I saw, but
I only did a quick
                  glance. (If you want this email me direcly
please and I'll send
                  it tonight)
                  </good news>

                  This bug effects SMP machines (APIC bug) and
causes bezerko
                  mouse behavior, and can also cause NFS
problems (not sure what
                  thou)

                  I know that work is being done on this to fix
the bug (not sure
                  of how that is going). 

                  I have been given a temporary solution (the
patch -> change to
                  XT-PIC from APIC). but this means that all my
interrrupt for
                  that IRQ will be on the boot cpu. :-(

                  The original problem was believed to be in the
APIC-edge code.
                  But since I have been using the patch, and had
set it to
                  APIC-level this may no longer be true. The way
I see it there
                  are one of two possible problems then:
                      1) my mouse needs to be edge triggered but
then why did it
                  work for a month as lkevel triggered oh so
well?
                      2) the bug is not in the APIC-edge code,
but somewhere else
                  in the source code. 

                  This is an illusive bug, and very hard to
reproduce and occurs
                  randomly. Just thought an update was
necessary.

                  Joe
_____________________________________________________________
Do You Yahoo!?
Free instant messaging and more at http://messenger.yahoo.com

-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to