Hello-
        I was noticing some rather odd behavior in our system which is running
rtlinux-2.2 (the prepatched 2.2.14 kernel).  This distribution came with
rt_com under the drivers directory.  I've been using this version of rt_com
for two serial inputs - one at 4800 baud, the other at 115,200 baud - for
about a year and a half.  I have always had a problem with random kernel
page faults which would halt the system.  This initially appeared to occur
only when I did a read of a scsi hard drive (using an aic78xx adaptec
controller).  I upgraded the driver for this scsi controller several times,
but the random page faults continued to occur.  Then about a month ago one
of our customers received a page fault while trying to write images to the
hard drive.
        Our customer also began to have problems with the frames themselves - there
would be random black lines through images.  I had also started to notice
this behavior in some of our in house systems.  Another engineer and myself
sat down and began systematically disabling portions of hardware and or
software to isolate this problem, as we had an in-house system that failed
consistently (we are still unsure as to why this particular system failed
consistently as to the others failing randomly).  We eventually narrowed it
down to the device running at 115,200 baud.
        I then upgraded rt_com to 0.5.3, thinking it may have been an error in
buffering the incoming data, which was then overwriting internal kernel data
structures.  Both myself and another software engineer scrutinized the
buffering, and it seems to be fine - no problems there.  After many many
hours of commenting out various portions of the isr I was finally able to
narrow it down to the interrupt service routine in rt_com.c .  Specifically,
the errors were occurring only when there was data being sent/received over
the 115,200 baud data line.  It took me quite a while before I noticed the
fact that the isr actually runs itself four times before exiting - using a
do/while loop.  Changing this loop to occurring once fixed all of our
problems.
        I'm just putting this out as a heads up.  I believe the problem was due to
the fact that we had non-realtime components being pre-empted by the real
time scheduler being used by rt_com for interrupts.  The isr was then
blocking for too long as I'm sure it had a lot of data it could process at
115,200 baud.  Allowing a greater number of interrupts to occur between data
processing in the rt_com isr appears to allow the 'normal' linux drivers to
get their interrupts on a more regular basis.
        Does anyone else have any ideas on this?  Am I correct in my thinking here
or is there another phenomena occurring?  If anyone else is doing something
at similar speeds and having problems with rt_com, I'd like to know.  Again,
I have only fixed this problem recently (about a week ago) and have had no
failures since then.  Given the random nature of the failures in the first
place, it remains to be seen if this truly fixed the problem, or will only
make it occur less frequently.

Troy Davis
Airborne Data Systems, Inc.


-- [rtl] ---
To unsubscribe:
echo "unsubscribe rtl" | mail [EMAIL PROTECTED] OR
echo "unsubscribe rtl <Your_email>" | mail [EMAIL PROTECTED]
--
For more information on Real-Time Linux see:
http://www.rtlinux.org/

Reply via email to