Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Wed, Apr 03, 2013 at 05:53:27PM -0600, Bjorn Helgaas wrote: [+cc David and iommu list, Yinghai, Jiang] On Mon, Mar 4, 2013 at 12:04 PM, Neil Horman nhor...@tuxdriver.com wrote: A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. Signed-off-by: Neil Horman nhor...@tuxdriver.com CC: Prarit Bhargava pra...@redhat.com CC: Don Zickus dzic...@redhat.com CC: Don Dutile ddut...@redhat.com CC: Bjorn Helgaas bhelg...@google.com CC: Asit Mallick asit.k.mall...@intel.com CC: linux-...@vger.kernel.org --- Change notes: v2) * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX chipset series is x86 only. I decided however to keep the quirk as a regular quirk, not an early_quirk. Early quirks have no way currently to determine if BIOS has properly disabled the feature in the iommu, at least not without significant hacking, and since its quite possible this will be a short lived quirk, should Don Z's workaround code prove successful (and it looks like it may well), I don't think that necessecary. * Removed the WARNING banner from the quirk, and added the HW_ERR token to the string, I opted to leave the newlines in place however, as I really couldnt find a way to keep the text on a single line is still legible from a code perspective. I think theres enough language in there that using cscope on just about any substring however will turn it up, and again, this may be a short lived quirk. --- arch/x86/kernel/quirks.c | 18 ++ include/linux/pci_ids.h | 2 ++ 2 files changed, 20 insertions(+) diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 26ee48a..a718ea2 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -5,6 +5,7 @@ #include linux/irq.h #include asm/hpet.h +#include ../../../drivers/iommu/irq_remapping.h #if defined(CONFIG_X86_IO_APIC) defined(CONFIG_SMP) defined(CONFIG_PCI) @@ -567,3 +568,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F5, quirk_amd_nb_node); #endif + +static void intel_remapping_check(struct pci_dev *dev) +{ + u8 revision; + + pci_read_config_byte(dev, PCI_REVISION_ID, revision); + + if ((revision == 0x13) irq_remapping_enabled) { +pr_warn(HW_ERR This system BIOS has enabled interrupt remapping\n +on a chipset that contains an errata making that\n +feature unstable. Please reboot with nointremap\n +added to the kernel command line and contact\n +your BIOS vendor for an update); + } +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5520_IOHUB, intel_remapping_check); +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5500_IOHUB, intel_remapping_check); This started as an IOMMU change, and I'm not an expert in that area, so I added David and the IOMMU list. I'd rather have him deal with this than me. This was never an iommu change, other than the fact that interrupt mapping and the iommu are tightly coupled. Is this something we can just *fix* in the kernel, e.g., by turning off interrupt remapping ourselves, or does it have to be done before the OS boots? Possibly, but it didn't seem safe to me. As it currently stands, pci quirks are executed after the apic is configured, which means its possible the problem might trigger before the quirk has a chance to execute (even the early quirk). There was an exchange about this earlier in the thread IIRC. diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 31717bd..54027a6 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2732,6 +2732,8 @@
RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Wednesday, April 03, 2013 11:32 PM To: Joerg Roedel Cc: Sethi Varun-B16395; Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation. On Tue, 2013-04-02 at 18:18 +0200, Joerg Roedel wrote: Cc'ing Alex Williamson Alex, can you please review the iommu-group part of this patch? Sure, it looks pretty reasonable. AIUI, all PCI devices are below some kind of host bridge that is either new and supports partitioning or old and doesn't. I don't know if that's a visibility or isolation requirement, perhaps PCI ACS-ish. In the new host bridge case, each device gets a group. This seems not to have any quirks for multifunction devices though. On AMD and Intel IOMMUs we test multifunction device ACS support to determine whether all the functions should be in the same group. Is there any reason to trust multifunction devices on PAMU? [Sethi Varun-B16395] In the case where we can partition endpoints we can distinguish transactions based on the bus,device,function number combination. This support is available in the PCIe controller (host bridge). I also find it curious what happens to the iommu group of the host bridge. In the partitionable case the host bridge group is removed, in the non-partitionable case the host bridge group becomes the group for the children, removing the host bridge. It's unique to PAMU so far that these host bridges are even in an iommu group (x86 only adds pci devices), but I don't see it as necessarily wrong leaving it in either scenario. Does it solve some problem to remove them from the groups? Thanks, [Sethi Varun-B16395] The PCIe controller isn't a partitionable entity, it would always be owned by the host. -Varun ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register
On Thu, 2013-04-04 at 14:48 +0900, Takao Indoh wrote: - DMAR fault messages floods and second kernel does not boot. Recently I saw similar report. https://lkml.org/lkml/2013/3/8/120 Right. So the fix for that is to make the subsequent errors silent, until/unless we actually get a request to create a mapping for the given device. - igb driver detectes error on linkup and kdump via network fails. That's a driver bug, IIRC. It was failing to completely reset the hardware. It's fixed now, isn't it? - On a certain platform, though kdump itself works, PCIe error like Unexpected Completion is detected and it gets hardware degraded. More information required. -- dwmw2 smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Apr 04, 2013 at 08:57:06AM -0600, Bjorn Helgaas wrote: On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman nhor...@tuxdriver.com wrote: On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote: On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote: ); + + if ((revision == 0x13) irq_remapping_enabled) { +pr_warn(HW_ERR This system BIOS has enabled interrupt remapping\n +on a chipset that contains an errata making that\n +feature unstable. Please reboot with nointremap\n +added to the kernel command line and contact\n +your BIOS vendor for an update); This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'. Ok, copy that. I'll repost shortly When you do, please include URLs for any problem reports or bugzillas you have. Well, those are going to be vendor specific, so I'm not sure we can really do that, at least not in any meaningful way. I assume Windows just works in this situation? No more or less than linux does in this case. The Intel provided errata indicates that the only acceptable workaround is to disable remapping in the BIOS, so I would presume that if a windows system has a BIOS that doesn't implement this fix, its just as exposed as we are. Neil ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Thursday, April 04, 2013 8:52 PM To: Sethi Varun-B16395 Cc: Joerg Roedel; Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation. On Thu, 2013-04-04 at 13:00 +, Sethi Varun-B16395 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Wednesday, April 03, 2013 11:32 PM To: Joerg Roedel Cc: Sethi Varun-B16395; Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation. On Tue, 2013-04-02 at 18:18 +0200, Joerg Roedel wrote: Cc'ing Alex Williamson Alex, can you please review the iommu-group part of this patch? Sure, it looks pretty reasonable. AIUI, all PCI devices are below some kind of host bridge that is either new and supports partitioning or old and doesn't. I don't know if that's a visibility or isolation requirement, perhaps PCI ACS-ish. In the new host bridge case, each device gets a group. This seems not to have any quirks for multifunction devices though. On AMD and Intel IOMMUs we test multifunction device ACS support to determine whether all the functions should be in the same group. Is there any reason to trust multifunction devices on PAMU? [Sethi Varun-B16395] In the case where we can partition endpoints we can distinguish transactions based on the bus,device,function number combination. This support is available in the PCIe controller (host bridge). So can x86 IOMMUs, that's the visibility aspect of IOMMU groups. Visibility alone doesn't necessarily imply that a device is isolated though. A multifunction PCI device that doesn't expose ACS support may not isolate functions from each other. For example a peer-to-peer DMA between functions may not be translated by the upstream IOMMU. IOMMU groups should encompass both visibility and isolation. [Sethi Varun-B16395] We can isolate the DMA access to the host based on the to the pci bus,device,function number. I thought that was enough to put devices in to separate iommu groups. This is a PCIe controller property which allows us to partition PCIe devices. But, what I can understand from your point is that we also need to consider isolation at PCIe device level as well. I will check for the case of multifunction devices. I also find it curious what happens to the iommu group of the host bridge. In the partitionable case the host bridge group is removed, in the non-partitionable case the host bridge group becomes the group for the children, removing the host bridge. It's unique to PAMU so far that these host bridges are even in an iommu group (x86 only adds pci devices), but I don't see it as necessarily wrong leaving it in either scenario. Does it solve some problem to remove them from the groups? Thanks, [Sethi Varun-B16395] The PCIe controller isn't a partitionable entity, it would always be owned by the host. Ownership of a device shouldn't play into the group context. An IOMMU group should be defined by it's visibility and isolation from other devices. Whether the PCIe controller is allowed to be handed to userspace is a question for VFIO. [Sethi Varun-B16395] The problem is in the case, where we can't partition PCIe devices. PCIe devices share the same device group as the PCI controller. This becomes a problem while assigning the devices to the guest, as you are required to unbind all the PCIe devices including the controller from the host. PCIe controller can't be unbound from the host, so we simply delete the controller iommu_group. -Varun ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] documentation: iommu: add description of ARM System MMU binding
This patch adds a description of the device tree binding for the ARM System MMU architecture. Cc: Rob Herring robherri...@gmail.com Cc: Andreas Herrmann andreas.herrm...@calxeda.com Signed-off-by: Will Deacon will.dea...@arm.com --- Hello, The driver for this is still a WIP. Both Andreas and myself have prototype code, but we're planning to merge that together to get something more general. Deciding on the binding is a good first step. All comments welcome, Will .../devicetree/bindings/iommu/arm,smmu.txt | 61 ++ 1 file changed, 61 insertions(+) create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu.txt diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt new file mode 100644 index 000..938325f --- /dev/null +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -0,0 +1,61 @@ +* ARM System MMU Architecture Implementation + +ARM SoCs may contain an implementation of the ARM System Memory +Management Unit Architecture, which can be used to provide 1 or 2 stages +of address translation to bus masters external to the CPU. + +The SMMU may also raise interrupts in response to various fault +conditions. + +** System MMU required properties: + +- compatible: Should be one of arm,smmu-v1 or arm,smmu-v2 + depending on the version of the architecture + implemented. + +- reg : Base address and size of the SMMU. + +- #global-interrupts : The number of global interrupts exposed by the + device. + +- interrupts: Interrupt list, with the first #global-irqs entries + corresponding to the global interrupts and any + following entries corresponding to context interrupts, + specified in order of their indexing by the SMMU. + +- mmu-masters : A list of phandles to device nodes representing bus + masters for which the SMMU can provide a translation. + +- stream-ids: A list of 16-bit values corresponding to the StreamIDs + for the devices listed in the mmu-masters property. + This list must be same length as mmu-masters, so + masters with multiple stream-ids will have multiple + entries in mmu-masters. + +** System MMU optional properties: + +- smmu-parent : When multiple SMMUs are chained together, this + property can be used to provide a phandle to the + parent SMMU (that is the next SMMU on the path going + from the mmu-masters towards memory) node for this + SMMU. + +Example: + +smmu { +compatible = arm,smmu-v1; +reg = 0xba5e 0x1; +#global-interrupts = 2; +interrupts = 0 32 4, + 0 33 4, + 0 34 4, /* This is the first context interrupt */ + 0 35 4, + 0 36 4, + 0 37 4; +mmu-masters = dma0, + dma0, + dma1; +stream-ids = 0xd01d, + 0xd01e, + 0xd11c; +}; -- 1.8.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Apr 4, 2013 at 9:39 AM, Neil Horman nhor...@tuxdriver.com wrote: On Thu, Apr 04, 2013 at 08:57:06AM -0600, Bjorn Helgaas wrote: On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman nhor...@tuxdriver.com wrote: On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote: On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote: ); + + if ((revision == 0x13) irq_remapping_enabled) { +pr_warn(HW_ERR This system BIOS has enabled interrupt remapping\n +on a chipset that contains an errata making that\n +feature unstable. Please reboot with nointremap\n +added to the kernel command line and contact\n +your BIOS vendor for an update); This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'. Ok, copy that. I'll repost shortly When you do, please include URLs for any problem reports or bugzillas you have. Well, those are going to be vendor specific, so I'm not sure we can really do that, at least not in any meaningful way. Sorry, I don't understand your point. It's useful to know who reported it (e.g., for future testers) and what happened and what bugzillas it solved. Of course it applies only to machines with this chipset and certain BIOS revisions. I assume Windows just works in this situation? No more or less than linux does in this case. The Intel provided errata indicates that the only acceptable workaround is to disable remapping in the BIOS, so I would presume that if a windows system has a BIOS that doesn't implement this fix, its just as exposed as we are. It sounds like the effect of this bug is that on Linux, devices may not work at all because of lost interrupts. Either Windows must never enable remapping (so it never sees the bug), or it must be designed in a way that tolerates the problem. I can't believe these machines shipped with Windows certification if devices didn't work correctly. Either way, I don't understand why we can't make the quirk just fix this. Booting with nointremap only sets disable_irq_remap, which is only used by irq_remapping_supported(). Early quirks are run before irq_remapping_supported () is ever called, so an early quirk ought to be just as effective as the command line option. Here's the relevant call tree I see: start_kernel setup_arch parse_early_param early_quirks rest_init ... first use of disable_irq_remap The x86 setup_arch() does call generic_apic_probe(), but as far as I can tell, none of the APIC .probe() methods reference disable_irq_remap, so that doesn't look like a problem. Bjorn ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.
On Thu, Mar 7, 2013 at 7:35 PM, Andrew Cooks aco...@gmail.com wrote: This patch creates a quirk to allow the Intel IOMMU to be enabled for devices that use incorrect tags during DMA. It is similar to the quirk for Ricoh devices, but allows mapping multiple functions and mapping of 'ghost' functions that do not correspond to real devices. Devices that need this include a variety of Marvell 88SE91xx based SATA controllers. [1][2] Changelog: v4: Process feedback received from Alex Williamson. * don't assume function 0 is a real device. * exit early if no ghost functions are known, or all known functions have been mapped. * cleanup failure case so mapping succeeds or fails for all ghost functions per device. * improve comments. v3: * Adopt David Woodhouse's terminology by referring to the quirky functions as 'ghost' functions. * Unmap ghost functions when device is detached from IOMMU. * Stub function for when CONFIG_PCI_QUIRKS is not enabled. This patch was generated against 3.9-rc1, but will also apply to 3.7.10. Bug reports: 1. https://bugzilla.redhat.com/show_bug.cgi?id=757166 2. https://bugzilla.kernel.org/show_bug.cgi?id=42679 Signed-off-by: Andrew Cooks aco...@gmail.com --- drivers/iommu/intel-iommu.c | 69 +++ drivers/pci/quirks.c| 67 +- include/linux/pci.h |5 +++ include/linux/pci_ids.h |1 + 4 files changed, 141 insertions(+), 1 deletions(-) I'm OK with the pci/quirks.c part of this, but the bulk of the interesting code is in intel-iommu.c, so I assume the IOMMU folks will take care of this. Bjorn diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 0099667..f53f3e3 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1674,6 +1674,69 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment, return 0; } +static void iommu_detach_dev(struct intel_iommu *iommu, u8 bus, u8 devfn); + +static void unmap_ghost_dma_fn(struct pci_dev *pdev, u8 fn_map) +{ + u8 fn; + struct intel_iommu *iommu; + + iommu = device_to_iommu(pci_domain_nr(pdev-bus), pdev-bus-number, + pdev-devfn); + + /* something must be seriously fubar if we can't lookup the iommu. */ + BUG_ON(!iommu); + + for (fn = 0; fn = 7 fn_map fn; fn++) { + if (fn == PCI_FUNC(pdev-devfn)) + continue; + if (fn_map (1fn)) { + iommu_detach_dev(iommu, + pdev-bus-number, + PCI_DEVFN(PCI_SLOT(pdev-devfn), fn)); + dev_dbg(pdev-dev, quirk; ghost func %d unmapped, + fn); + } + } +} + +/* For quirky devices like Marvell 88SE91xx chips that use ghost functions. */ +static int map_ghost_dma_fn(struct dmar_domain *domain, + struct pci_dev *pdev, + int translation) +{ + u8 fn, fn_map; + u8 fn_mapped = 0; + int err = 0; + + fn_map = pci_get_dma_source_map(pdev); + + /* this is the common, non-quirky case. */ + if (!fn_map) + return 0; + + for (fn = 0; fn = 7 fn_map fn; fn++) { + if (fn == PCI_FUNC(pdev-devfn)) + continue; + if (fn_map (1fn)) { + err = domain_context_mapping_one(domain, + pci_domain_nr(pdev-bus), + pdev-bus-number, + PCI_DEVFN(PCI_SLOT(pdev-devfn), fn), + translation); + if (err) { + dev_err(pdev-dev, + mapping ghost func %d failed, fn); + unmap_ghost_dma_fn(pdev, fn_mapped); + return err; + } + dev_dbg(pdev-dev, quirk; ghost func %d mapped, fn); + fn_mapped |= (1fn); + } + } + return 0; +} + static int domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev, int translation) @@ -1687,6 +1750,11 @@ domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev, if (ret) return ret; + /* quirk for undeclared/ghost pci functions */ + ret = map_ghost_dma_fn(domain, pdev, translation); + if (ret) + return ret; + /* dependent device mapping */ tmp = pci_find_upstream_pcie_bridge(pdev); if (!tmp) @@ -3786,6 +3854,7 @@ static void domain_remove_one_dev_info(struct
Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Apr 4, 2013 at 11:51 AM, Neil Horman nhor...@tuxdriver.com wrote: Oh, you want the bug report that I'm fixing this against? Sure, I can do that. I thought you wanted me to include a url in the WARN_TAINT, with which user could report occurances of this bug. Yeah, the bug that this is reported in is: https://bugzilla.redhat.com/show_bug.cgi?id=887006 Its standing in for about a dozen or so variants of this issue we've seen Exactly -- I'm just hoping for something in the changelog. BTW, this particular bugzilla is not public. Regardless, theres also the security issue to consider here - namely that disabling irq remapping opens up users of virt to a possible security bug (potential irq injection). Some users may wish to live with the remapping error, given that error typically leads to devices that need to be restarted/reset to start working again, rather than live with the security hole. I rather like the warning, that gives users a choice, but I'll spin up a version that just disables it if you would rather. I don't believe users will want to make a choice like that or even be sophisticated enough to do it, at least not based on something in dmesg. I'm pretty sure I'm not :) The only supportable thing I can imagine doing would be: - Disable interrupt remapping if this chipset defect is present, so devices work reliably (they don't need whatever restart/reset you referred to above). - Disable virt functionality when interrupt remapping is disabled to avoid the security problem (I don't know the details of this.) - Add a command-line option to enable interrupt remapping (I think intremap=on is currently parsed too early, but maybe this could be reworked so the option could override the quirk disable). - Add release notes saying boot with 'intremap=on' if you want the virt functionality and can accept unreliable devices. That way the default behavior is safe and reliable (though perhaps lacking some functionality), and you have told the user a way to get safe and unreliable operation if he's willing to accept that. At least, that's what I think I would want if I were in RH's shoes. Bjorn ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: RFC: vfio API changes needed for powerpc (v2)
On Thu, 2013-04-04 at 17:32 +, Yoder Stuart-B08248 wrote: Based on the email thread over the last couple of days, I have below an more concrete proposal (v2) for new ioctls supporting vfio-pci on SoCs with the Freescale PAMU. Example usage is as described by Scott: count = VFIO_IOMMU_GET_MSI_BANK_COUNT VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // do other DMA maps now, or later, or not at all, doesn't matter for (i = 0; i count; i++) VFIO_IOMMU_MAP_MSI_BANK(iova, i); // The kernel now knows where each bank has been mapped, and can // update PCI config space appropriately. Thanks, Stuart The Freescale PAMU is an aperture-based IOMMU with the following characteristics. Each device has an entry in a table in memory describing the iova-phys mapping. The mapping has: -an overall aperture that is power of 2 sized, and has a start iova that is naturally aligned -has 1 or more windows within the aperture -number of windows must be power of 2, max is 256 -size of each window is determined by aperture size / # of windows -iova of each window is determined by aperture start iova / # of windows -the mapped region in each window can be different than the window size...mapping must power of 2 -physical address of the mapping must be naturally aligned with the mapping size /* * VFIO_PAMU_GET_ATTR * * Gets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets argsz and attribute. The ioctl fills in * the provided struct vfio_pamu_attr based on the attribute * value that was set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_attr { __u32 argsz; __u32 attribute; #define VFIO_ATTR_GEOMETRY 1 #define VFIO_ATTR_WINDOWS 2 #define VFIO_ATTR_PAMU_STASH 3 /* fowlling fields are for VFIO_ATTR_GEOMETRY */ __u64 aperture_start; /* first address that can be mapped*/ __u64 aperture_end; /* last address that can be mapped */ /* follwing fields are for VFIO_ATTR_WINDOWS */ __u32 windows;/* number of windows in the aperture */ /* initially this will be the max number * of windows that can be set */ /* following fields are for VFIO_ATTR_PAMU_STASH */ __u32 cpu;/* CPU number for stashing */ __u32 cache; /* cache ID for stashing */ Can multiple attribute bits be set at once? If not then the above could be a union and the attribute could be an enum. A flags field is probably still a good idea. }; #define VFIO_PAMU_GET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_PAMU_SET_ATTR * * Sets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets struct vfio_pamu attr, including argsz and attribute and * setting any fields that are valid for the attribute. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ #define VFIO_PAMU_SET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_PAMU_GET_MSI_BANK_COUNT * * Returns the number of MSI banks for this platform. This tells user space * how many aperture windows should be reserved for MSI banks when setting * the PAMU geometry and window count. * Fills in provided struct vfio_pamu_msi_banks. Caller sets argsz. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_banks { __u32 argsz; __u32 bank_count; /* the number of MSI }; #define VFIO_PAMU_GET_MSI_BANK_COUNT _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_msi_banks) argsz w/o flags doesn't really buy us much. /* * VFIO_PAMU_MAP_MSI_BANK * * Maps the MSI bank at the specified index and iova. User space must * call this ioctl once for each MSI bank (count of banks is returned by * VFIO_IOMMU_GET_MSI_BANK_COUNT). * Caller provides struct vfio_pamu_msi_bank_map with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_map { __u32 argsz; __u32 msi_bank_index; /* the index of the MSI bank */ __u64 iova; /* the iova the bank is to be mapped to */ }; Again, flags. If we dynamically add or remove devices from a container the bank count can change, right? If bank count goes from 2 to 3, does userspace know assume the new bank is [2]? If bank count goes from 3
RFC: vfio API changes needed for powerpc (v3)
-v3 updates -made vfio_pamu_attr a union, added flags -s/VFIO_PAMU_/VFIO_IOMMU_PAMU_/ for the ioctls to make it more clear which fd is being operated on -added flags to vfio_pamu_msi_bank_map/umap -VFIO_PAMU_GET_MSI_BANK_COUNT now just returns a __u32 not a struct -fixed some typos The Freescale PAMU is an aperture-based IOMMU with the following characteristics. Each device has an entry in a table in memory describing the iova-phys mapping. The mapping has: -an overall aperture that is power of 2 sized, and has a start iova that is naturally aligned -has 1 or more windows within the aperture -number of windows must be power of 2, max is 256 -size of each window is determined by aperture size / # of windows -iova of each window is determined by aperture start iova / # of windows -the mapped region in each window can be different than the window size...mapping must power of 2 -physical address of the mapping must be naturally aligned with the mapping size These ioctls operate on the VFIO file descriptor (/dev/vfio/vfio). /* * VFIO_IOMMU_PAMU_GET_ATTR * * Gets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets argsz and attribute. The ioctl fills in * the provided struct vfio_pamu_attr based on the attribute * value that was set. * Return: 0 on success, -errno on failure */ struct vfio_pamu_attr { __u32 argsz; __u32 flags;/* no flags currently */ __u32 attribute; union { /* VFIO_ATTR_GEOMETRY */ struct { __u64 aperture_start; /* first addr that can be mapped */ __u64 aperture_end; /* last addr that can be mapped */ } attr; /* VFIO_ATTR_WINDOWS */ __u32 windows; /* number of windows in the aperture */ /* initially this will be the max number * of windows that can be set */ /* VFIO_ATTR_PAMU_STASH */ struct { __u32 cpu; /* CPU number for stashing */ __u32 cache; /* cache ID for stashing */ } stash; } }; #define VFIO_IOMMU_PAMU_GET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_IOMMU_PAMU_SET_ATTR * * Sets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets struct vfio_pamu attr, including argsz and attribute and * setting any fields that are valid for the attribute. * Return: 0 on success, -errno on failure */ #define VFIO_IOMMU_PAMU_SET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT * * Returns the number of MSI banks for this platform. This tells user space * how many aperture windows should be reserved for MSI banks when setting * the PAMU geometry and window count. * Return: __u32 bank count on success, -errno on failure */ #define VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT _IO(VFIO_TYPE, VFIO_BASE + x, __u32) /* * VFIO_IOMMU_PAMU_MAP_MSI_BANK * * Maps the MSI bank at the specified index and iova. User space must * call this ioctl once for each MSI bank (count of banks is returned by * VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT). * Caller provides struct vfio_pamu_msi_bank_map with all fields set. * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_map { __u32 argsz; __u32 flags; /* no flags currently */ __u32 msi_bank_index; /* the index of the MSI bank */ __u64 iova; /* the iova the bank is to be mapped to */ }; #define VFIO_IOMMU_PAMU_MAP_MSI_BANK _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_msi_bank_map ) /* * VFIO_IOMMU_PAMU_UNMAP_MSI_BANK * * Unmaps the MSI bank at the specified iova. * Caller provides struct vfio_pamu_msi_bank_unmap with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_unmap { __u32 argsz; __u32 flags; /* no flags currently */ __u64 iova; /* the iova to be unmapped to */ }; #define VFIO_IOMMU_PAMU_UNMAP_MSI_BANK _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_msi_bank_unmap ) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: RFC: vfio API changes needed for powerpc (v2)
-Original Message- From: Wood Scott-B07421 Sent: Thursday, April 04, 2013 5:52 PM To: Yoder Stuart-B08248 Cc: Alex Williamson; Wood Scott-B07421; ag...@suse.de; Bhushan Bharat-R65777; Sethi Varun-B16395; k...@vger.kernel.org; qemu-de...@nongnu.org; iommu@lists.linux-foundation.org Subject: Re: RFC: vfio API changes needed for powerpc (v2) On 04/04/2013 04:38:44 PM, Yoder Stuart-B08248 wrote: /* * VFIO_PAMU_MAP_MSI_BANK * * Maps the MSI bank at the specified index and iova. User space must * call this ioctl once for each MSI bank (count of banks is returned by * VFIO_IOMMU_GET_MSI_BANK_COUNT). * Caller provides struct vfio_pamu_msi_bank_map with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_map { __u32 argsz; __u32 msi_bank_index; /* the index of the MSI bank */ __u64 iova; /* the iova the bank is to be mapped to */ }; Again, flags. If we dynamically add or remove devices from a container the bank count can change, right? If bank count goes from 2 to 3, does userspace know assume the new bank is [2]? If bank count goes from 3 to 2, which index was that? If removing a device removes an MSI bank then vfio-pamu will automatically do the unmap, right? My assumption is that the bank count returned by VFIO_IOMMU_GET_MSI_BANK_COUNT is the max bank count for a platform. (number will mostly likely always be 3 or 4). So it won't change as devices are added or removed. It should be the actual number of banks used. This is required if we're going to have userspace do the iteration and specify the exact iovas to use -- and even if we aren't going to do that, it would be more restrictive on available iova-space than is necessary. Usually there will be only one bank in the group. Actually mapping all of the MSI banks, all the time, would completely negate the point of using the separate alias pages. The geometry, windows, DMA mappings, etc is set on a 'container' which may have multiple groups in it. So user space needs to determine the total number of MSI windows needed when setting the geometry and window count. In the flow you proposed: count = VFIO_IOMMU_GET_MSI_BANK_COUNT VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // do other DMA maps now, or later, or not at all, doesn't matter for (i = 0; i count; i++) VFIO_IOMMU_MAP_MSI_BANK(iova, i); ...that count has to be the total, not the count for 1 of N possible groups. So the get count ioctl is not done on a group. However, like you pointed out we don't want to negate isolation of the separate alias pages. All this API is doing is telling the kernel which windows to use for which MSI banks. It's up to the kernel to actually enable them as needed. Say 3 MSI banks exist. If there are no groups added to the vfio container and all 3 MAP_MSI_BANK calls occurred the picture may look like this (based on my earlier example): win gphys/ # enabled iovaphys size --- --- 5 N 0x1000 0xf_fe044000 4KB// msi bank 1 6 N 0x1400 0xf_fe045000 4KB// msi bank 2 7 N 0x1800 0xf_fe046000 4KB// msi bank 3 User space adds 2 groups that use bank 1: win gphys/ # enabled iovaphys size --- --- 5 Y 0x1000 0xf_fe044000 4KB// msi bank 1 6 N 0x1400 0xf_fe045000 4KB// msi bank 2 7 N 0x1800 0xf_fe046000 4KB// msi bank 3 User space adds another group that uses bank 3: win gphys/ # enabled iovaphys size --- --- 5 Y 0x1000 0xf_fe044000 4KB// msi bank 1 6 N 0x1400 0xf_fe045000 4KB// msi bank 2 7 Y 0x1800 0xf_fe046000 4KB// msi bank 3 User space doesn't need to care what is actually enabled, it just needs to the the kernel which windows to use and the kernel can take care of the rest. Stuart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Thursday, April 04, 2013 10:14 PM To: Sethi Varun-B16395 Cc: Joerg Roedel; Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation. On Thu, 2013-04-04 at 16:35 +, Sethi Varun-B16395 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Thursday, April 04, 2013 8:52 PM To: Sethi Varun-B16395 Cc: Joerg Roedel; Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation. On Thu, 2013-04-04 at 13:00 +, Sethi Varun-B16395 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Wednesday, April 03, 2013 11:32 PM To: Joerg Roedel Cc: Sethi Varun-B16395; Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation. On Tue, 2013-04-02 at 18:18 +0200, Joerg Roedel wrote: Cc'ing Alex Williamson Alex, can you please review the iommu-group part of this patch? Sure, it looks pretty reasonable. AIUI, all PCI devices are below some kind of host bridge that is either new and supports partitioning or old and doesn't. I don't know if that's a visibility or isolation requirement, perhaps PCI ACS-ish. In the new host bridge case, each device gets a group. This seems not to have any quirks for multifunction devices though. On AMD and Intel IOMMUs we test multifunction device ACS support to determine whether all the functions should be in the same group. Is there any reason to trust multifunction devices on PAMU? [Sethi Varun-B16395] In the case where we can partition endpoints we can distinguish transactions based on the bus,device,function number combination. This support is available in the PCIe controller (host bridge). So can x86 IOMMUs, that's the visibility aspect of IOMMU groups. Visibility alone doesn't necessarily imply that a device is isolated though. A multifunction PCI device that doesn't expose ACS support may not isolate functions from each other. For example a peer-to-peer DMA between functions may not be translated by the upstream IOMMU. IOMMU groups should encompass both visibility and isolation. [Sethi Varun-B16395] We can isolate the DMA access to the host based on the to the pci bus,device,function number. The IOMMU can only isolate DMA that it can see. A multifunction device may never expose peer-to-peer DMA to the upstream device, it's implementation specific. The ACS flags allow that possibility to be controlled and prevented. I thought that was enough to put devices in to separate iommu groups. This is a PCIe controller property which allows us to partition PCIe devices. But, what I can understand from your point is that we also need to consider isolation at PCIe device level as well. I will check for the case of multifunction devices. I also find it curious what happens to the iommu group of the host bridge. In the partitionable case the host bridge group is removed, in the non-partitionable case the host bridge group becomes the group for the children, removing the host bridge. It's unique to PAMU so far that these host bridges are even in an iommu group (x86 only adds pci devices), but I don't see it as necessarily wrong leaving it in either scenario. Does it solve some problem to remove them from the groups? Thanks, [Sethi Varun-B16395] The PCIe controller isn't a partitionable entity, it would always be owned by the host. Ownership of a device shouldn't play into the group context. An IOMMU group should be defined by it's visibility and isolation from other devices. Whether the PCIe controller is allowed to be handed to userspace is a question for VFIO. [Sethi Varun-B16395] The problem is in the case, where we can't partition PCIe devices. PCIe devices share the same device group as the PCI controller. This becomes a problem while assigning the devices to the guest, as you are required to unbind all the PCIe devices including the controller from the host. PCIe controller can't be unbound from the host, so we simply delete the
Ricoh DMAR bug returns? (WAS Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.)
On Tue, Apr 2, 2013 at 11:47 PM, Pat Erley pat-l...@erley.org wrote: On 04/02/2013 10:50 AM, Andrew Cooks wrote: On 2 Apr 2013 15:37, Pat Erley pat-l...@erley.org mailto:pat-l...@erley.org wrote: On 03/07/2013 09:35 PM, Andrew Cooks wrote: --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c +/* Table of multiple (ghost) source functions. This is similar to the + * translated sources above, but with the following differences: + * 1. the device may use multiple functions as DMA sources, + * 2. these functions cannot be assumed to be actual devices, they're simply + * incorrect DMA tags. + * 3. the specific ghost function for a request can not always be predicted. + * For example, the actual device could be xx:yy.1 and it could use + * both 0 and 1 for different requests, with no obvious way to tell when + * DMA will be tagged as comming from xx.yy.0 and and when it will be tagged + * as comming from xx.yy.1. + * The bitmap contains all of the functions used in DMA tags, including the + * actual device. + * See https://bugzilla.redhat.com/show_bug.cgi?id=757166, + * https://bugzilla.kernel.org/show_bug.cgi?id=42679 + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768 + */ +static const struct pci_dev_dma_multi_func_sources { + u16 vendor; + u16 device; + u8 func_map;/* bit map. lsb is fn 0. */ +} pci_dev_dma_multi_func_sources[] = { + { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)}, + { 0 } +}; Adding another buggy device. I have a Ricoh multifunction device: 17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01) 17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 01) 17:00.0 0805: 1180:e822 (rev 01) 17:00.3 0c00: 1180:e832 (rev 01) The Ricoh device issue has been known for some time and a quirk has been available since commit 12ea6cad1c7d046 in June 2012. It's slightly different than the problem this patch tries to work around [1]. Hmm, I've had this problem with many recent (vanilla) kernels, up to and including 3.9-rc5 that adding entries for also fixed booting. I don't have any SD cards or firewire devices handy to test that they work, but the system now boots, which was not the case without your patch and IOMMU/DMAR enabled. That is really strange. Could you tell us what kernel version you tested and provide dmesg output? I'll capture a vanilla 3.8.5 boot without any patches and iommu=off, then try to find another machine to catch what I can of a netconsole boot with iommu=on. What's the preferred way to send these? pastebin links? I'd been running the 'dirty' fix that's in the redhat bugzilla entry. I checked my .config and have CONFIG_PCI_QUIRKS=y, and verified my devices are in the quirks table for the pci_func_0_dma_source fixup. Do you mean that even though your hardware is specifically listed in the quirk table, the quirk simply hasn't worked for you? That would be unfortunate, to say the least. The fedora kernel included a separate patch for this issue until recently (see https://bugzilla.redhat.com/show_bug.cgi?id=880051). It basically just disabled DMAR when the Ricoh device is present, the same as the patch to the mailing list you mentioned. Is the dirty fix you're referring to comment 7? https://bugzilla.redhat.com/show_bug.cgi?id=605888#c7 [1] In the Ricoh case, multiple functions are used for real devices and the bug is that these devices all use function 0 during DMA. In this particular case, I'd expect the FireWire device 17:00.3 to issue DMA from the SD Host Controller address 17:00.0. The quirk is not too much of a terrible hack - it's a fairly simple translation. In the Marvell case, the real device uses DMA source tags that don't actually belong to any visible devices. The quirk to make this work is more invasive, not nearly as elegant and has not attracted much enthusiasm from subsystem maintainers, though I'm still hopeful that a quirk will be merged in some form or another. Thanks for explaining the difference! Pat ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Ricoh DMAR bug returns? (WAS Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.)
On 04/05/2013 12:44 AM, Andrew Cooks wrote: On Tue, Apr 2, 2013 at 11:47 PM, Pat Erley pat-l...@erley.org wrote: On 04/02/2013 10:50 AM, Andrew Cooks wrote: On 2 Apr 2013 15:37, Pat Erley pat-l...@erley.org mailto:pat-l...@erley.org wrote: On 03/07/2013 09:35 PM, Andrew Cooks wrote: --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c +/* Table of multiple (ghost) source functions. This is similar to the + * translated sources above, but with the following differences: + * 1. the device may use multiple functions as DMA sources, + * 2. these functions cannot be assumed to be actual devices, they're simply + * incorrect DMA tags. + * 3. the specific ghost function for a request can not always be predicted. + * For example, the actual device could be xx:yy.1 and it could use + * both 0 and 1 for different requests, with no obvious way to tell when + * DMA will be tagged as comming from xx.yy.0 and and when it will be tagged + * as comming from xx.yy.1. + * The bitmap contains all of the functions used in DMA tags, including the + * actual device. + * See https://bugzilla.redhat.com/show_bug.cgi?id=757166, + * https://bugzilla.kernel.org/show_bug.cgi?id=42679 + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768 + */ +static const struct pci_dev_dma_multi_func_sources { + u16 vendor; + u16 device; + u8 func_map;/* bit map. lsb is fn 0. */ +} pci_dev_dma_multi_func_sources[] = { + { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)}, + { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)}, + { 0 } +}; Adding another buggy device. I have a Ricoh multifunction device: 17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01) 17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 01) 17:00.0 0805: 1180:e822 (rev 01) 17:00.3 0c00: 1180:e832 (rev 01) The Ricoh device issue has been known for some time and a quirk has been available since commit 12ea6cad1c7d046 in June 2012. It's slightly different than the problem this patch tries to work around [1]. Hmm, I've had this problem with many recent (vanilla) kernels, up to and including 3.9-rc5 that adding entries for also fixed booting. I don't have any SD cards or firewire devices handy to test that they work, but the system now boots, which was not the case without your patch and IOMMU/DMAR enabled. That is really strange. Could you tell us what kernel version you tested and provide dmesg output? I'll capture a vanilla 3.8.5 boot without any patches and iommu=off, then try to find another machine to catch what I can of a netconsole boot with iommu=on. What's the preferred way to send these? pastebin links? I'd been running the 'dirty' fix that's in the redhat bugzilla entry. I checked my .config and have CONFIG_PCI_QUIRKS=y, and verified my devices are in the quirks table for the pci_func_0_dma_source fixup. Do you mean that even though your hardware is specifically listed in the quirk table, the quirk simply hasn't worked for you? That would be unfortunate, to say the least. Precisely. The fedora kernel included a separate patch for this issue until recently (see https://bugzilla.redhat.com/show_bug.cgi?id=880051). It basically just disabled DMAR when the Ricoh device is present, the same as the patch to the mailing list you mentioned. Yes, that's what I've been avoiding doing. Every new release, I boot once with iommu=on, and firewire blacklisted, boot up, load the firewire driver. This has caused the 'Ricoh DMAR' bug on every kernel since I got the laptop. I then reboot and Is the dirty fix you're referring to comment 7? https://bugzilla.redhat.com/show_bug.cgi?id=605888#c7 Apply this patch, which has worked fine for me, but per a commend in a thread I created here on 10/19/2012[1], this has a potential significant performance impact. In my use case, a performance hit is worth the cost for the features. However, your patch(while not intended to be the fix), actually solves the issue on my machine. I don't know if it also has the potential performance impact, but it's certainly not noticeably worse in my use case. Pat Erley [1] http://marc.info/?l=linux-pcim=135094489232548w=2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu