On Mon, Sep 17, 2018 at 10:09:33AM -0500, Brijesh Singh wrote: [...]
> > > +static int amdvi_int_remap_legacy(AMDVIState *iommu, > > > + MSIMessage *origin, > > > + MSIMessage *translated, > > > + uint64_t *dte, > > > + X86IOMMUIrq *irq, > > > + uint16_t sid) > > > +{ > > > + int ret; > > > + union irte irte; > > > + > > > + /* get interrupt remapping table */ > > > + ret = amdvi_get_irte(iommu, origin, dte, &irte, sid); > > > + if (ret < 0) { > > > + return ret; > > > + } > > > + > > > + if (!irte.fields.valid) { > > > + trace_amdvi_ir_target_abort("RemapEn is disabled"); > > > > Note that recently QEMU introduced error_report_once(). Feel free to > > switch to that if you want. Many of the VT-d emulation code switched > > to that to make sure errors like this will be dumped at least once and > > we're also free from Dos attack. This should at least apply to all of > > your below trace_amdvi_ir_target_abort() calls, even some other traces. > > > > > IMHO we should not be using error_report_once() here. It's possible that > guest OS have DTE[IV]=1 but has not programmed the interrupt > remapping entries or have deactivated the remapping. I see that Linux > OS does it all the time and in those cases we will be printing out > error messages - which will give a wrong information. I would be a bit curious on why the guest would like to switch the irq off if the device still needs to generate any... though this question could be out of topic so feel free to skip. But if that's normal then I would suggest you use a better naming of the trace point, "*_target_abort" seems to be a severe issue. Also, my suggestion is a general one - it still applies to the places that you think a well-behaved guest should not do. It's optional so you can decide whether you want to use it. [...] > > > + /* validate that we are configure with intremap=on */ > > > + if (!s->intr_enabled) { > > > + error_report("Interrupt remapping is enabled in the guest but " > > > + "not in the host. Use intremap=on to enable > > > interrupt " > > > + "remapping in amd-iommu."); > > > + exit(1); > > > > This will crash QEMU. IMHO this is not a configuration error but a > > guest kernel error. We'd better not crash QEMU for that. Wny not > > just report the error upwards. > > > > > Honestly, I did struggled whether to use "exit" or error. Actually this > is a configuration error. > > AMD IOMMU does not have an explicit bit to indicate whether the IR > feature is supported. All versions of IOMMU supports IR. But in QEMU > we are using 'intermap' to enable/disable the feature. In case of AMD IOMMU, > we will not be able to link intremap to control the exposure of > this feature to the guest. So we need to do some checks during the > runtime. If we get asked to remap the interrupts (when intremap=off) > then IMHO its a configuration error and we should kill the guest. > Having said so, I am flexible and if needed then I can remove the > exit from next patch. If all AMD IOMMU has IR, then I'm curious how the old AMD IOMMU emulation worked since that does not have an IR, do you know how the guest detects that? (It must have done so otherwise IRQ won't work there, isn't it?) And if that's true, instead of check intr_enabled here, I would suggest you check it in realize() - we'd better crash asap when we know this error, and we know it from the very beginning. > > > (btw, patch 6 is Linux specific quirk) > > > > > + } > > > + > > > + return true; > > > +} > > > + > > > /* Interrupt remapping for MSI/MSI-X entry */ > > > static int amdvi_int_remap_msi(AMDVIState *iommu, > > > MSIMessage *origin, > > > MSIMessage *translated, > > > uint16_t sid) > > > { > > > + int ret = 0; > > > + uint64_t pass = 0; > > > + uint64_t dte[4] = { 0 }; > > > + X86IOMMUIrq irq = { 0 }; > > > + uint8_t dest_mode, delivery_mode; > > > + > > > assert(origin && translated); > > > + /* > > > + * When IOMMU is enabled, interrupt remap request will come either > > > from > > > + * IO-APIC or PCI device. If interrupt is from PCI device then it > > > will > > > + * have a valid requester id but if the interrupt is from IO-APIC > > > + * then requester id will be invalid. > > > + */ > > > + if (sid == X86_IOMMU_SID_INVALID) { > > > + sid = AMDVI_IOAPIC_SB_DEVID; > > > + } > > > + > > > trace_amdvi_ir_remap_msi_req(origin->address, origin->data, sid); > > > - if (!iommu || !iommu->intr_enabled) { > > > + /* verify that interrupt remapping is enabled before going further. > > > */ > > > + if (!iommu || > > > + !amdvi_get_dte(iommu, sid, dte) || > > > + !amdvi_validate_int_reamp(iommu, dte)) { > > > memcpy(translated, origin, sizeof(*origin)); > > > goto out; > > > } > > > @@ -1055,10 +1184,68 @@ static int amdvi_int_remap_msi(AMDVIState *iommu, > > > return -AMDVI_IR_ERR; > > > } > > > + delivery_mode = (origin->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 7; > > > > IIRC you mentioned that you'll paste some field definition here, or > > which figure/table we should refer to. But here I still see no > > comment, and I don't know where defined it. Again, Figure 14, bits > > 10:8 are part of index there. > > > This time used the pre-defined macro's from the apic-msidef.h hence I > thought there was no need to include the field definitions. But, I will > go ahead and include the field definition in the next rev. > > * The delivery_mode field from the MSI data is used to get the upstream > interrupt type. > * If the interrupt type is Fixed/Arbitrated, MSI data 10:0 is used > as an offset within the IRTE table to get the actual IRTE entry > (Fig 14). > > Please note that we don't emulate the HyperTransport hence instead we > should not treat MSI data 10:8 as index instead the MSI data 10:0 should > be used as "offset" within the IRTE table to get the interrupt remapping > table entries. Frankly speaking until now I don't quite understand what's that HyperTransport thing... but ok I think I understand your point now: you mean that the "Interrupt type" mentioned in Table 19 is exactly the same bits (10:8) defined in general MSI data register. I think that's the only knowledge that I'm missing from there. Say, Table 19 told me that "what interrupt type should be controlled by what", however it never told me where could I find the interrupt types... or is it defined somewhere but I didn't notice? (I do feel hard to read the AMD spec some times...) If my understanding is correct, maybe you could comment above the delivery_mode with things like "Please refer to Table 19; delivery mode is defined as the same as general MSI data format on bits 10:8". Thanks, -- Peter Xu