On Fri, Aug 13, 2010 at 8:25 PM, Artyom Tarasenko <atar4q...@googlemail.com> wrote: > 2010/7/30 Blue Swirl <blauwir...@gmail.com>: >> On Tue, Jul 27, 2010 at 8:10 PM, Artyom Tarasenko >> <atar4q...@googlemail.com> wrote: >>> 2010/7/27 Blue Swirl <blauwir...@gmail.com>: >>>> On Mon, Jul 26, 2010 at 10:23 PM, Artyom Tarasenko >>>> <atar4q...@googlemail.com> wrote: >>>>> 2010/7/26 Blue Swirl <blauwir...@gmail.com>: >>>>>> On Mon, Jul 26, 2010 at 4:53 PM, Artyom Tarasenko >>>>>> <atar4q...@googlemail.com> wrote: >>>>>>> 2010/6/21 Artyom Tarasenko <atar4q...@googlemail.com>: >>>>>>>> 2010/5/25 Blue Swirl <blauwir...@gmail.com>: >>>>>>>>>>> About bugs, IIRC NetBSD 3.x crash could be related to IOMMU. >>>>>>>>>> >>>>>>>>>> What does indicate it? It happens where the disk sizes are normally >>>>>>>>>> reported, so it could be a scsi/dma/irq/fpu issue as well. >>>>>>>>> >>>>>>>>> IIRC the DVMA address was 0xfc004000, but the mapped entries were for >>>>>>>>> 0xfc000000 to 0xfc003fff. >>>>>>> >>>>>>> Under OpenBIOS. And even less with OBP, and much less if the network >>>>>>> card is disabled. >>>>>>> >>>>>>>> It looks like we have multiple problems here: they start with >>>>>>>> 0xfc004000 access (which can theoretically be expected on the real >>>>>>>> hardware too) as you pointed out, but what happens afterwards is >>>>>>>> strange too: >>>>>>>> >>>>>>>> - In the current qemu implementation we have a screaming NMI which >>>>>>>> NetBSD can not clear. This happens cause NMI in qemu is literally >>>>>>>> non-maskable, while on the real hardware it can be masked with the >>>>>>>> 'mask all' flag. I'll send a patch for it. >>>>>>>> >>>>>>>> - with the masking patch, the NMI is not screaming but still is >>>>>>>> percepted as spurious. This may be ok if NetBSD (1.6-3.1) doesn't have >>>>>>>> a moduleerr_handler set. >>>>>>> >>>>>>> Or because scsi dma transfer on a real hardware never generates a nmi. >>>>>>> >>>>>>> In the current implementation, when "select with attention" is >>>>>>> processed, scsi controller initiates a dma transfer and fetches a CDB. >>>>>>> If dma fails (not mapped, or not allowed), NMI is generated. It is >>>>>>> quite a strange design: such an error is an asynchronous event, and >>>>>>> CPU wouldn't know, that scsi controller tried to do some dma at >>>>>>> certain address. It would have been more consequent to send the error >>>>>>> notification to the dma initiator (scsi controller in this case), not >>>>>>> to CPU. >>>>>>> >>>>>>> The offending code in NetBSD 1.6-3.1: >>>>>>> >>>>>>> NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA); // Here it crashes (under >>>>>>> qemu) cause dma page is not valid >>>>>>> NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize); // The >>>>>>> page would have been made valid here. >>>>>>> NCRDMA_GO(sc); >>>>>>> >>>>>>> In the working versions (before 1.6 and after 4.0) the code looks like >>>>>>> this: >>>>>>> >>>>>>> NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize); >>>>>>> //... >>>>>>> NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA); >>>>>>> NCRDMA_GO(sc); >>>>>>> >>>>>>> After debugging the code on the real hardware, it looks like qemu has >>>>>>> multiple problems in scsi/dma/iommu layer. >>>>>>> >>>>>>> I modified NCRDMA_SETUP, so that it did dma transfer without mapping >>>>>>> the page. In this case NetBSD 3.1 shows the following error (on a real >>>>>>> SS-20): >>>>>>> >>>>>>> dma0: error: >>>>>>> csr=a4440212<ERR,DRAINING=0,IEN,ENDMA,BURST=1,FASTER,ALOADED> >>>>>>> esp0: DMA error; resetting >>>>>>> dma0: error: >>>>>>> csr=a4440212<ERR,DRAINING=0,IEN,ENDMA,BURST=1,FASTER,ALOADED> >>>>>>> >>>>>>> no NMI. >>>>>>> >>>>>>> And what is more important, on the real hardware "select with >>>>>>> attention" does not initiate dma (put a delay, waited 2 seconds and >>>>>>> nothing happened). It has to be done manually. >>>>>>> >>>>>>> Any suggestions how to fix it according to the current iommu/dma >>>>>>> architecture? Looks like "select with attention" should register >>>>>>> callbacks? ( Volunteers? ;-) ) >>>>>> >>>>>> Excellent analysis! >>>>>> >>>>>> About NMI: IOMMU just raises the qemu_irq provided by sun4m.c. The >>>>>> interrupt bit number is currently 30, which is Module Error >>>>>> (asynchronous fault). Maybe this should be 29, MSI (MBus-SBus >>>>>> Interface) interrupt? That is still NMI though. Could you check what >>>>>> interrupt bits get active in the interrupt controller master status? >>>>> >>>>> 80000, (system timer/level 10 only), but I have to check that no one >>>>> steals it. >>>>> >>>>>> What is in IOMMU AFSR? >>>>> >>>>> 0, as well as AFAR. I have to ensure that no one steals it before I >>>>> read it, though. On the real hw tracking registers is more tricky than >>>>> on emulated. >>>>> >>>>>> About select with attention: NCRDMA_GO just tweaks DMA controller, so >>>>>> ESP shouldn't perform the transfer if DMA is not ready. >>>>> >>>>> You mean the controller is tweaked before satn and then NCRDMA_GO >>>>> allows the transfer, and then ESP performs it? >>>>> To me it looks like NCRDMA_SETUP not just sets DVMA, but also programs >>>>> CSR to actually perform the transfer: >>>>> http://fxr.watson.org/fxr/source/dev/ic/lsi64854.c?v=NETBSD3#L359 >>>> >>>> Almost, DMA_GO finally enables DMA: >>>> http://fxr.watson.org/fxr/source/dev/ic/lsi64854var.h?v=NETBSD3#L96 >>> >>> Yes, but doesn't setting target and csr registers imply the command is >>> transferred manually, and not by ESP via "Select with attention" ? >> >> I'd suppose target and csr have no effect until DMA is enabled. > > Yes, but the question is what device triggers the dma when it gets > enabled. IIUYC, you say that "select with attention" makes ESP to > perform a DMA transfer. But I think esp doesn't do it, transfer is > rather done by the dma controller as programmed in its registers when > the DMA gets enabled.
Funny, that is also what I meant. I thought you meant that setting target and csr already started the transfer. > I suggest select with/out attention don't perform s->dma_memory_read, > but rather pass a callback to the DMA controller, so the controller > pulls this callback when/if dma_memory_read happens. Something like that. I looked at this briefly, but queuing the ESP actions does not seem very easy. >>>>>> I think Linux always pre-programs DMA. >>>>>> >>>>>> One way to handle this would be to add a qemu_irq signal from DMA to >>>>>> ESP which tells ESP whether DMA is ready.DMA raises or lowers the >>>>>> interrupt whenever DMA is enabled or disabled. When the IRQ is >>>>>> received by ESP, If there is no transfer pending, it just adjusts an >>>>>> internal flag about DMA status. If there is a transfer pending, it is >>>>>> started. When ESP handles a command, it should check the internal DMA >>>>>> flag. If DMA is ready, continue with the transfer immediately like >>>>>> now. Otherwise, hold the transfer and store parameters to internal >>>>>> state. >>>>> >>>>> Sounds reasonable. So you suggest that ESP dma channel is wired to >>>>> ESP, and not slavio? Or to both of them? >>> >>> On a second thought, doesn't setting a callback seem more appropriate? >>> Select with(out) attention would just connect the target to the esp, >>> and dma would communicate with the target via the esp callback. >> >> The signal approach should be more generic. ESP is also used by MIPS. > > But MIPS also has a dma controller. What can be the problem? Or is it > known that the current dma/select with(out) attention implementation > works as the real MIPS hardware? It's not the SATN thing, but making the changes opt-in. If we add a callback, MIPS code should probably be changed immediately. With the IRQ solution, that can be postponed until someone who knows MIPS hardware wants to implement the signal.