On Fri, 12 Oct 2001, Norm Dresner wrote:

> I know from hard, sad experience that the normal Linux
> kernel (at least 2.2.x and I believe 2.4.x too) is able to
> deal with kernel segfaults easily.  It declares a segfault
> and "terminates" the current process.  If that was part of
> the initialization phase of the module, then it's stuck
> until the next reboot (barring heroic measures that involve
> plausibly fatal kernel table tinkering; a reboot is much
> easier) and if it was in response to a user-task's call
> (read/write/open/close/ioctl/...) then the user-task is
> terminated (if you're maintaining connection counts, you're
> out of luck because there's no notification of that and you
> can't unload your module because the count says it's still
> in use).
>
> That's what the normal Linux kernel -- even with RTLinux
> patches -- does when you're in the kernel and not in
> real-time mode.  When the incident I described occurred I
> had turned interrupts off with
>         rtl_no_interrupts( saveit );
> and I'm not sure if that would surpress a hardware
> interrupt from the segfault itself.  If so, I can see why
> the hardware would hang: The CPU has raised the SEGFAULT
> interrupt and is waiting for it to be handled but it can't
> because interrupts are disabled.   Anyway, that's my guess
> as answering my own question.
>

If I remember correctly, software interrupts, exceptions, traps, and
hardware interrupts are all pretty different things.  By turning hardware
interrupts off, one is not necessarily affecting the operation of the
other types of cpu-events. At times like these I *really* wish I had the
intel architecture-32 manuals that intel gives out for free...

-Calin

>     Norm
>
>
> ----- Original Message -----
> From: Calin A. Culianu <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Cc: rtlinux <[EMAIL PROTECTED]>
> Sent: Friday, October 12, 2001 12:10 PM
> Subject: Re: [rtl] Can anyone explain why this happened?
>
>
> >
> > Well, in kernel mode there really isn't any memory
> protection, so chances
> > are it wasn't a 'segfault' per se, but rather a page
> fault.  I am willing
> > to bet that the page you tried to access wasn't in
> physical memory at the
> > time and maybe the kernel's swapper tried to get it off
> the hard disk but
> > got hung due to the interrupt characteristics of the
> currently running
> > (realtime) thread.  This is, however just a guess.  I
> wish I were more
> > familiar with kernel internals (I am getting there!) to
> say for sure.. but
> > my above theory seems plausible at least..... :)
> >
> > However, you raise an interesting issue:  Anyone know
> offhand how
> > intelligent the kernel is about detecting when a kernel
> thread tries to
> > access memory it has no business touching?  I am assuming
> the only rule is
> > that you can't read memory location '0', but other than
> that, anything
> > else goes... is that so?
> >
> >  -Calin
> >
> >
> >
> > On Thu, 11 Oct 2001, Norm Dresner
> > wrote:
> >
> > > Inside a function called by a real-time task, I coded a
> > > "simple" routine to read the contents from a port.
> Because
> > > I wanted not to even try to read from the address if
> the
> > > corresponding board wasn't there, I put an if-statement
> in
> > > front of the i/o read.
> > >
> > > The code I wanted to write was:
> > >     ...
> > >     if ( BoardOK[ boardno ] )
> > >         return inw( basePort );
> > >     else
> > >         return( -1 );
> > >
> > > But instead I screwed up and wrote
> > >     ...
> > >     if( BoardOK[ basePort ] )  // basePort ~ 0x300
> while
> > > boardno ~ 1-4
> > >         return inw( basePort );
> > >     else
> > >         return( -1 );
> > >
> > > And the computer would *hang* every time the function
> was
> > > called.  I'm obviously addressing memory beyond the end
> of
> > > the array and there's probably not enough static data
> in
> > > the driver to allow addressing 0x300 * sizeof( int )
> beyond
> > > the end of it without running out of the address range
> > > assigned to my driver's data segment either.  [I know
> that
> > > the hang in this routine and not after it returned data
> > > because I had already commented out that code in an
> attempt
> > > to isolate the problem.]
> > >
> > > If I was addressing beyond the end of the virtual
> address
> > > space for the kernel, I'd expect to get a segfault, not
> a
> > > hang.  And if there was memory there, why didn't I just
> > > read a value instead of hanging so solidly that only
> the
> > > reset or power switches would have any effect?
> > >
> > > I don't know how rtlinux handles a sigfault so I'm
> hoping
> > > that someone else does and can  explain why this
> happened.
> > >
> > > TIA
> > >     Norm
> > >
> > >
> > >
> > >
> > > -- [rtl] ---
> > > To unsubscribe:
> > > echo "unsubscribe rtl" | mail [EMAIL PROTECTED] OR
> > > echo "unsubscribe rtl <Your_email>" | mail
> [EMAIL PROTECTED]
> > > --
> > > For more information on Real-Time Linux see:
> > > http://www.rtlinux.org/
> > >
> >
> > -- [rtl] ---
> > To unsubscribe:
> > echo "unsubscribe rtl" | mail [EMAIL PROTECTED] OR
> > echo "unsubscribe rtl <Your_email>" | mail
> [EMAIL PROTECTED]
> > --
> > For more information on Real-Time Linux see:
> > http://www.rtlinux.org/
> >
>

-- [rtl] ---
To unsubscribe:
echo "unsubscribe rtl" | mail [EMAIL PROTECTED] OR
echo "unsubscribe rtl <Your_email>" | mail [EMAIL PROTECTED]
--
For more information on Real-Time Linux see:
http://www.rtlinux.org/

Reply via email to