dma (was NMI handling)

Blue Swirl Fri, 13 Aug 2010 13:41:29 -0700

On Fri, Aug 13, 2010 at 8:25 PM, Artyom Tarasenko
<atar4q...@googlemail.com> wrote:
> 2010/7/30 Blue Swirl <blauwir...@gmail.com>:
>> On Tue, Jul 27, 2010 at 8:10 PM, Artyom Tarasenko
>> <atar4q...@googlemail.com> wrote:
>>> 2010/7/27 Blue Swirl <blauwir...@gmail.com>:
>>>> On Mon, Jul 26, 2010 at 10:23 PM, Artyom Tarasenko
>>>> <atar4q...@googlemail.com> wrote:
>>>>> 2010/7/26 Blue Swirl <blauwir...@gmail.com>:
>>>>>> On Mon, Jul 26, 2010 at 4:53 PM, Artyom Tarasenko
>>>>>> <atar4q...@googlemail.com> wrote:
>>>>>>> 2010/6/21 Artyom Tarasenko <atar4q...@googlemail.com>:
>>>>>>>> 2010/5/25 Blue Swirl <blauwir...@gmail.com>:
>>>>>>>>>>> About bugs, IIRC NetBSD 3.x crash could be related to IOMMU.
>>>>>>>>>>
>>>>>>>>>> What does indicate it? It happens where the disk sizes are normally
>>>>>>>>>> reported, so it could be a scsi/dma/irq/fpu issue as well.
>>>>>>>>>
>>>>>>>>> IIRC the DVMA address was 0xfc004000, but the mapped entries were for
>>>>>>>>> 0xfc000000 to 0xfc003fff.
>>>>>>>
>>>>>>> Under OpenBIOS. And even less with OBP, and much less if the network
>>>>>>> card is disabled.
>>>>>>>
>>>>>>>> It looks like we have multiple problems here: they start with
>>>>>>>> 0xfc004000 access (which can theoretically be expected on the real
>>>>>>>> hardware too) as you pointed out, but what happens afterwards is
>>>>>>>> strange too:
>>>>>>>>
>>>>>>>> - In the current qemu implementation we have a screaming NMI which
>>>>>>>> NetBSD can not clear. This happens cause NMI in qemu is literally
>>>>>>>> non-maskable, while on the real hardware it can be masked with the
>>>>>>>> 'mask all' flag. I'll send a patch for it.
>>>>>>>>
>>>>>>>> - with the masking patch, the NMI is not screaming but still is
>>>>>>>> percepted as spurious. This may be ok if NetBSD (1.6-3.1) doesn't have
>>>>>>>> a moduleerr_handler set.
>>>>>>>
>>>>>>> Or because scsi dma transfer on a real hardware never generates a nmi.
>>>>>>>
>>>>>>> In the current implementation, when "select with attention" is
>>>>>>> processed, scsi controller initiates a dma transfer and fetches a CDB.
>>>>>>> If dma fails (not mapped, or not allowed), NMI is generated. It is
>>>>>>> quite a strange design: such an error is an asynchronous event, and
>>>>>>> CPU wouldn't know, that scsi controller tried to do some dma at
>>>>>>> certain address. It would have been more consequent to send the error
>>>>>>> notification to the dma initiator (scsi controller in this case),  not
>>>>>>> to CPU.
>>>>>>>
>>>>>>> The offending code in NetBSD 1.6-3.1:
>>>>>>>
>>>>>>> NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA); // Here it crashes (under
>>>>>>> qemu) cause dma page is not valid
>>>>>>> NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize); // The
>>>>>>> page would have been made valid here.
>>>>>>> NCRDMA_GO(sc);
>>>>>>>
>>>>>>> In the working versions (before 1.6 and after 4.0) the code looks like 
>>>>>>> this:
>>>>>>>
>>>>>>> NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize);
>>>>>>> //...
>>>>>>> NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA);
>>>>>>> NCRDMA_GO(sc);
>>>>>>>
>>>>>>> After debugging the code on the real hardware, it looks like qemu has
>>>>>>> multiple problems in scsi/dma/iommu layer.
>>>>>>>
>>>>>>> I modified NCRDMA_SETUP, so that it did dma transfer without mapping
>>>>>>> the page. In this case NetBSD 3.1 shows the following error (on a real
>>>>>>> SS-20):
>>>>>>>
>>>>>>> dma0: error: 
>>>>>>> csr=a4440212<ERR,DRAINING=0,IEN,ENDMA,BURST=1,FASTER,ALOADED>
>>>>>>> esp0: DMA error; resetting
>>>>>>> dma0: error: 
>>>>>>> csr=a4440212<ERR,DRAINING=0,IEN,ENDMA,BURST=1,FASTER,ALOADED>
>>>>>>>
>>>>>>> no NMI.
>>>>>>>
>>>>>>> And what is more important, on the real hardware "select with
>>>>>>> attention" does not initiate dma (put a delay, waited 2 seconds and
>>>>>>> nothing happened). It has to be done manually.
>>>>>>>
>>>>>>> Any suggestions how to fix it according to the current iommu/dma
>>>>>>> architecture? Looks like "select with attention" should register
>>>>>>> callbacks?  ( Volunteers? ;-) )
>>>>>>
>>>>>> Excellent analysis!
>>>>>>
>>>>>> About NMI: IOMMU just raises the qemu_irq provided by sun4m.c. The
>>>>>> interrupt bit number is currently 30, which is Module Error
>>>>>> (asynchronous fault). Maybe this should be 29, MSI (MBus-SBus
>>>>>> Interface) interrupt? That is still NMI though. Could you check what
>>>>>> interrupt bits get active in the interrupt controller master status?
>>>>>
>>>>> 80000, (system timer/level 10 only), but I have to check that no one 
>>>>> steals it.
>>>>>
>>>>>> What is in IOMMU AFSR?
>>>>>
>>>>> 0, as well as AFAR. I have to ensure that no one steals it before I
>>>>> read it, though. On the real hw tracking registers is more tricky than
>>>>> on emulated.
>>>>>
>>>>>> About select with attention: NCRDMA_GO just tweaks DMA controller, so
>>>>>> ESP shouldn't perform the transfer if DMA is not ready.
>>>>>
>>>>> You mean the controller is tweaked before satn and then NCRDMA_GO
>>>>> allows the transfer, and then ESP performs it?
>>>>> To me it looks like NCRDMA_SETUP not just sets DVMA, but also programs
>>>>> CSR to actually perform the transfer:
>>>>> http://fxr.watson.org/fxr/source/dev/ic/lsi64854.c?v=NETBSD3#L359
>>>>
>>>> Almost, DMA_GO finally enables DMA:
>>>> http://fxr.watson.org/fxr/source/dev/ic/lsi64854var.h?v=NETBSD3#L96
>>>
>>> Yes, but doesn't setting target and csr registers imply the command is
>>> transferred manually, and not by ESP via "Select with attention" ?
>>
>> I'd suppose target and csr have no effect until DMA is enabled.
>
> Yes, but the question is what device triggers the dma when it gets
> enabled. IIUYC, you say that "select with attention" makes ESP to
> perform a DMA transfer. But I think esp doesn't do it, transfer is
> rather done by the dma controller as programmed in its registers when
> the DMA gets enabled.


Funny, that is also what I meant. I thought you meant that setting
target and csr already started the transfer.

> I suggest select with/out attention don't perform s->dma_memory_read,
> but rather pass a callback to the DMA controller, so the controller
> pulls this callback when/if dma_memory_read happens.

Something like that. I looked at this briefly, but queuing the ESP
actions does not seem very easy.

>>>>>> I think Linux always pre-programs DMA.
>>>>>>
>>>>>> One way to handle this would be to add a qemu_irq signal from DMA to
>>>>>> ESP which tells ESP whether DMA is ready.DMA raises or lowers the
>>>>>> interrupt whenever DMA is enabled or disabled. When the IRQ is
>>>>>> received by ESP, If there is no transfer pending, it just adjusts an
>>>>>> internal flag about DMA status. If there is a transfer pending, it is
>>>>>> started. When ESP handles a command, it should check the internal DMA
>>>>>> flag. If DMA is ready, continue with the transfer immediately like
>>>>>> now. Otherwise, hold the transfer and store parameters to internal
>>>>>> state.
>>>>>
>>>>> Sounds reasonable. So you suggest that ESP dma channel is wired to
>>>>> ESP, and not slavio? Or to both of them?
>>>
>>> On a second thought, doesn't setting a callback seem more appropriate?
>>> Select with(out) attention would just connect the target to the esp,
>>> and dma would communicate with the target via the esp callback.
>>
>> The signal approach should be more generic. ESP is also used by MIPS.
>
> But MIPS also has a dma controller. What can be the problem? Or is it
> known that the current dma/select with(out) attention implementation
> works as the real MIPS hardware?

It's not the SATN thing, but making the changes opt-in. If we add a
callback, MIPS code should probably be changed immediately. With the
IRQ solution, that can be postponed until someone who knows MIPS
hardware wants to implement the signal.

[Qemu-devel] Re: sparc scsi/iommu/dma (was NMI handling)

Reply via email to