Re: PCI DMA lockups in 3.2 (3.3 maybe?)

Gerard Roudier Tue, 7 Dec 1999 12:46:02 -0800


On Tue, 7 Dec 1999, Peter Wemm wrote:

> I might add that others have found that using sym + fxp on the N440BX
> motherboards didn't solve their problems, or moved the problem elsewhere,
> eg: to the sbdrop() etc routines.  One other interesting variable.. an ahc
> + pn driver combination on a 440BX motherboard under -current in late may
> 99 had the exact same problems we saw a number of times with ncr + fxp (ie:
> sbdrop, sbflush, m_copym etc panics).  The same motherboard with ahc + de or
> fxp did not have the problems.

(ncr || sym || ahc) && fxp = TRUE makes the fxp a better culprit. :-)

If the corruption comes from some DMA from the BUS, then it may well have
happened that some chip did grab some stale address or length value and
did DMA inside the corresponding area.

This may happen, for example if a BUS address is not passed atomically to
a device, or for numerous other reasons. Note that for an atomicity
problem, the chip could have been the cause by performing non DWORD access
or have been victimized by a PCI transaction terminated with data between
2 DWORDS (note that in this latter case, something is wrong regarding
alignement in memory). The 'link_addr' handling from the C code looks to
me like some candidate such a atomicity problem for example, but since I
donnot know of the fxp device this might be just a quite wrong idea from
me. 

> In all cases the panics were extremely "strange".  The original fxp+ncr
> combination changed it's crash pattern when we put extra debugging in it to
> sanity check and check conditions.  The results varied from registers getting
> clobbered (as though an interrupt happened and the trapframe on the stack got
> changed by the interrupt handler and then returned with garbage contents in
> some registers.. this is what seems to be happening in the fxp_add_rfabuf()
> panics - %esi was getting loaded earlier on and when it got to do the
> vtophys() it was zero.

Some DMA performed using some stale but valid address (a difference of
less than 65536 against a valid address is unlikely to make the address
invalid for example) may lead to similar effects, btw, since any register
value comes from memory.

>  People have printed the contents of "rfa" on the stack
> and seen garbage - in fact it's a register variable under normal circumstances.
> Adding debugging caused it to be stored in the local variable rather than
> being left in %esi, and then the panics moved elsewhere (!).)
> 
> It had the markings of "something trashing something somewhere and then crashing
> quite a bit later".  :-(

Gérard.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message
Re: PCI DMA lockups in 3.2 (3.3 maybe?)

Reply via email to