Re: missing interrupts (was Re: CURRENT is freezing again ...)

2000-11-28 Thread Bruce Evans

On Mon, 27 Nov 2000, Andrew Gallatin wrote:

> 
> Bruce Evans writes:
>  > Possible causes of the problem:
>  > 1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but
>  >actually sends non-specific ones (0x20 | garbage).  Since interrupts
> 
> I think that sending non-specific EOIs is the problem.  Sending
> specific EOIs seem to eliminate my nic timeouts and the need to
> manually feed an eoi to recover from a missing interrupt.
> 
> My question is: how does one send a specific EOI correctly?  I don't
> have decent documentation for this.  Above, you seem to imply that
> 0x30 is a specific EOI.  That does not seem to work for me (machine
> locks at boot).
> 
> Linux uses 0xe0.  According to some Tru64 docs I have,
> that means "Rotate Priority on specific EOI".  According
> to that same documentation, 0x60 is a specific EOI.  Both of these

Oops, I misread the data sheet.  0x60 is correct, 0x30 is wrong.  The
irq number is in the lowest 3 bits.

> appear to work just fine.   What should the alpha port use?

I think it should use non-specific EOIs and send them early (when
there is no ambiguity about which interrupt is being handled), as in
the i386 port.  Sending them late mainly gives the ICU's braindamaged
interrupt priority scheme for longer than necessary.

Bruce



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: missing interrupts (was Re: CURRENT is freezing again ...)

2000-11-27 Thread Robert Drehmel

In <[EMAIL PROTECTED]>,
Andrew Gallatin wrote:
> Bruce Evans writes:
>  > Possible causes of the problem:
>  > 1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but
>  >actually sends non-specific ones (0x20 | garbage).  Since interrupts
>  >may be handled in non-LIFO order, this results in EOIs being sent
>  >for the wrong interrupts.  I think this just randomizes the
>  >brokenness caused by delaying sending of EOIs.  I can't see how it
>  >would result in an EOI being lost -- the right number of EOIs will
>  >have been sent after all handlers have returned.
> 
> 
> I think that sending non-specific EOIs is the problem.  Sending
> specific EOIs seem to eliminate my nic timeouts and the need to
> manually feed an eoi to recover from a missing interrupt.
> 
> My question is: how does one send a specific EOI correctly?  I don't
> have decent documentation for this.  Above, you seem to imply that
> 0x30 is a specific EOI.  That does not seem to work for me (machine
> locks at boot).
> 
> Linux uses 0xe0.  According to some Tru64 docs I have,
> that means "Rotate Priority on specific EOI".  According
> to that same documentation, 0x60 is a specific EOI.  Both of these
> appear to work just fine.   What should the alpha port use?

My notes say:

Non-specific EOI : 0x20
Specific EOI : 0x60 | IRQn
EOI + rotate priority: 0xa0
EOI + select lowest priority : 0xe0 | IRQn

-- 
Robert S. F. Drehmel <[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: missing interrupts (was Re: CURRENT is freezing again ...)

2000-11-27 Thread Andrew Gallatin


Bruce Evans writes:
 > Possible causes of the problem:
 > 1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but
 >actually sends non-specific ones (0x20 | garbage).  Since interrupts
 >may be handled in non-LIFO order, this results in EOIs being sent
 >for the wrong interrupts.  I think this just randomizes the
 >brokenness caused by delaying sending of EOIs.  I can't see how it
 >would result in an EOI being lost -- the right number of EOIs will
 >have been sent after all handlers have returned.


I think that sending non-specific EOIs is the problem.  Sending
specific EOIs seem to eliminate my nic timeouts and the need to
manually feed an eoi to recover from a missing interrupt.

My question is: how does one send a specific EOI correctly?  I don't
have decent documentation for this.  Above, you seem to imply that
0x30 is a specific EOI.  That does not seem to work for me (machine
locks at boot).

Linux uses 0xe0.  According to some Tru64 docs I have,
that means "Rotate Priority on specific EOI".  According
to that same documentation, 0x60 is a specific EOI.  Both of these
appear to work just fine.   What should the alpha port use?

Thanks,

Drew





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: missing interrupts (was Re: CURRENT is freezing again ...)

2000-11-18 Thread Bruce Evans

On Fri, 17 Nov 2000, Andrew Gallatin wrote:

> [fxp isa irq pending but never occurs]

> I then wrote a hack which sends an eoi.  If I call my hack from ddb
> and send an eoi for irq10, everything goes back to normal and the
> network interface is back.
> 
> So, is it a race in the interrupt code, or is it something about how
> the code is structured?
> 
> On the alpha at least, we get the irq, mask the irq and set the
> ithread runnable.  When the (isa) ithread runs, it calls the interrupt
> handler and then sends an eoi.  The interrupt is then unmasked.
> 
> I've peeked at the linux code and noticed that they do things
> differently.  They first mask the interrupt, and then send the eoi
> immediately -- before the handler runs.  They then run the handler
> and unmask the interrupt.  The seem to do this both on i386 and
> alpha.  

FreeBSD does the same thing on i386's as Linux, except for fast
interrupts it delays the EOI until the handler returns so that the
handler gets called as soon as possible.

> Does anybody have any ideas about this?  Does something bad
> happen if you don't send an eoi in a reasonable amount of time?

Delayed EOIs work normally, but lower priority interrupts (according
to the ICU's priority scheme) are masked until the EIO is sent.  This
is bad mainly because the ICU's priority scheme is different from
FreeBSD's priority scheme.

Possible causes of the problem:
1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but
   actually sends non-specific ones (0x20 | garbage).  Since interrupts
   may be handled in non-LIFO order, this results in EOIs being sent
   for the wrong interrupts.  I think this just randomizes the
   brokenness caused by delaying sending of EOIs.  I can't see how it
   would result in an EOI being lost -- the right number of EOIs will
   have been sent after all handlers have returned.
2) Insufficient locking for ICU accesses.  Again, I can't see how this
   would affect EOIs.  On i386's, some accesses are locked implicitly
   by sched_lock.
3) Enabling interrupts (and unlocking the ICU) before sending EOI seems
   to just make things more complicated.  It requires the specific EOIs
   in (1).

On alphas, interrupts aren't masked in the ICU while they are handled
(the disable/enable args in the call to alpha_setup_intr() in
isa_setup_intr() are NULL ...).  They are masked by some combination
of the CPU and ICU priorities.

Bruce



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



missing interrupts (was Re: CURRENT is freezing again ...)

2000-11-17 Thread Andrew Gallatin


Valentin Chopov writes:
 > Hi,
 > 
 > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in
 > 10 min after boot...
 > 

I've seen one similar problem on an alpha UP1000 that I'd like some
input about.

The UP1000 is essentially an alpha 21264 stuffed into an AMD Athlon
system.  It has an AMD-751 chipset and handles all device interrupts
via an isa interrupt controller.

I've noticed that under "heavy" load (gdb -k kernel.debug /dev/mem on
an NFS filesystem), the network interface goes away, never to
reappear.  All I see is "fxp0: device timeout" on console.
This started with SMPng.

After a little bit of investigation with ddb, I discovered that
the NIC's irq was pending.  Eg:

login: fxp0: device timeout
Stopped at  siointr1+0x17c: br  zero,siointr1+0x32c 
db> call isa_irq_pending()
0x410

The fxp interface is at ir10, so 0x410 means there's an irq 10
pending.

I then wrote a hack which sends an eoi.  If I call my hack from ddb
and send an eoi for irq10, everything goes back to normal and the
network interface is back.

So, is it a race in the interrupt code, or is it something about how
the code is structured?

On the alpha at least, we get the irq, mask the irq and set the
ithread runnable.  When the (isa) ithread runs, it calls the interrupt
handler and then sends an eoi.  The interrupt is then unmasked.

I've peeked at the linux code and noticed that they do things
differently.  They first mask the interrupt, and then send the eoi
immediately -- before the handler runs.  They then run the handler
and unmask the interrupt.  The seem to do this both on i386 and
alpha.  

Does anybody have any ideas about this?  Does something bad
happen if you don't send an eoi in a reasonable amount of time?


Drew
--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message