Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Neil Horman
On Wed, Apr 03, 2013 at 05:53:27PM -0600, Bjorn Helgaas wrote:
 [+cc David and iommu list, Yinghai, Jiang]
 
 On Mon, Mar 4, 2013 at 12:04 PM, Neil Horman nhor...@tuxdriver.com wrote:
  A few years back intel published a spec update:
  http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
 
  For the 5520 and 5500 chipsets which contained an errata (specificially 
  errata
  53), which noted that these chipsets can't properly do interrupt remapping, 
  and
  as a result the recommend that interrupt remapping be disabled in bios.  
  While
  many vendors have a bios update to do exactly that, not all do, and of 
  course
  not all users update their bios to a level that corrects the problem.  As a
  result, occasionally interrupts can arrive at a cpu even after affinity for 
  that
  interrupt has be moved, leading to lost or spurrious interrupts (usually
  characterized by the message:
  kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
 
  There have been several incidents recently of people seeing this error, and
  investigation has shown that they have system for which their BIOS level is 
  such
  that this feature was not properly turned off.  As such, it would be good to
  give them a reminder that their systems are vulnurable to this problem.
 
  Signed-off-by: Neil Horman nhor...@tuxdriver.com
  CC: Prarit Bhargava pra...@redhat.com
  CC: Don Zickus dzic...@redhat.com
  CC: Don Dutile ddut...@redhat.com
  CC: Bjorn Helgaas bhelg...@google.com
  CC: Asit Mallick asit.k.mall...@intel.com
  CC: linux-...@vger.kernel.org
 
  ---
 
  Change notes:
 
  v2)
 
  * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX
  chipset series is x86 only.  I decided however to keep the quirk as a 
  regular
  quirk, not an early_quirk.  Early quirks have no way currently to determine 
  if
  BIOS has properly disabled the feature in the iommu, at least not without
  significant hacking, and since its quite possible this will be a short lived
  quirk, should Don Z's workaround code prove successful (and it looks like 
  it may
  well), I don't think that necessecary.
 
  * Removed the WARNING banner from the quirk, and added the HW_ERR token to 
  the
  string, I opted to leave the newlines in place however, as I really couldnt
  find a way to keep the text on a single line is still legible from a code
  perspective.  I think theres enough language in there that using cscope on 
  just
  about any substring however will turn it up, and again, this may be a short
  lived quirk.
  ---
   arch/x86/kernel/quirks.c | 18 ++
   include/linux/pci_ids.h  |  2 ++
   2 files changed, 20 insertions(+)
 
  diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
  index 26ee48a..a718ea2 100644
  --- a/arch/x86/kernel/quirks.c
  +++ b/arch/x86/kernel/quirks.c
  @@ -5,6 +5,7 @@
   #include linux/irq.h
 
   #include asm/hpet.h
  +#include ../../../drivers/iommu/irq_remapping.h
 
   #if defined(CONFIG_X86_IO_APIC)  defined(CONFIG_SMP)  
  defined(CONFIG_PCI)
 
  @@ -567,3 +568,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 
  PCI_DEVICE_ID_AMD_15H_NB_F5,
  quirk_amd_nb_node);
 
   #endif
  +
  +static void intel_remapping_check(struct pci_dev *dev)
  +{
  +   u8 revision;
  +
  +   pci_read_config_byte(dev, PCI_REVISION_ID, revision);
  +
  +   if ((revision == 0x13)  irq_remapping_enabled) {
  +pr_warn(HW_ERR This system BIOS has enabled interrupt 
  remapping\n
  +on a chipset that contains an errata making 
  that\n
  +feature unstable.  Please reboot with 
  nointremap\n
  +added to the kernel command line and contact\n
  +your BIOS vendor for an update);
  +   }
  +}
  +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 
  PCI_DEVICE_ID_INTEL_5520_IOHUB, intel_remapping_check);
  +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 
  PCI_DEVICE_ID_INTEL_5500_IOHUB, intel_remapping_check);
 
 This started as an IOMMU change, and I'm not an expert in that area,
 so I added David and the IOMMU list.  I'd rather have him deal with
 this than me.
 
This was never an iommu change, other than the fact that interrupt mapping and
the iommu are tightly coupled.

 Is this something we can just *fix* in the kernel, e.g., by turning
 off interrupt remapping ourselves, or does it have to be done before
 the OS boots?
 
Possibly, but it didn't seem safe to me.  As it currently stands, pci quirks are
executed after the apic is configured, which means its possible the problem
might trigger before the quirk has a chance to execute (even the early quirk).
There was an exchange about this earlier in the thread IIRC.

  diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
  index 31717bd..54027a6 100644
  --- a/include/linux/pci_ids.h
  +++ b/include/linux/pci_ids.h
  @@ -2732,6 +2732,8 @@
   

RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-04 Thread Sethi Varun-B16395


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, April 03, 2013 11:32 PM
 To: Joerg Roedel
 Cc: Sethi Varun-B16395; Yoder Stuart-B08248; Wood Scott-B07421;
 iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; ga...@kernel.crashing.org;
 b...@kernel.crashing.org
 Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu
 implementation.
 
 On Tue, 2013-04-02 at 18:18 +0200, Joerg Roedel wrote:
  Cc'ing Alex Williamson
 
  Alex, can you please review the iommu-group part of this patch?
 
 Sure, it looks pretty reasonable.  AIUI, all PCI devices are below some
 kind of host bridge that is either new and supports partitioning or old
 and doesn't.  I don't know if that's a visibility or isolation
 requirement, perhaps PCI ACS-ish.  In the new host bridge case, each
 device gets a group.  This seems not to have any quirks for multifunction
 devices though.  On AMD and Intel IOMMUs we test multifunction device ACS
 support to determine whether all the functions should be in the same
 group.  Is there any reason to trust multifunction devices on PAMU?
 
[Sethi Varun-B16395] In the case where we can partition endpoints we can 
distinguish transactions based on the bus,device,function number combination. 
This support is available in the PCIe controller (host bridge).

 I also find it curious what happens to the iommu group of the host
 bridge.  In the partitionable case the host bridge group is removed, in
 the non-partitionable case the host bridge group becomes the group for
 the children, removing the host bridge.  It's unique to PAMU so far that
 these host bridges are even in an iommu group (x86 only adds pci
 devices), but I don't see it as necessarily wrong leaving it in either
 scenario.  Does it solve some problem to remove them from the groups?
 Thanks,
[Sethi Varun-B16395] The PCIe controller isn't a partitionable entity, it would 
always be owned by the host.

-Varun

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

2013-04-04 Thread David Woodhouse
On Thu, 2013-04-04 at 14:48 +0900, Takao Indoh wrote:
 
 - DMAR fault messages floods and second kernel does not boot. Recently I
   saw similar report. https://lkml.org/lkml/2013/3/8/120

Right. So the fix for that is to make the subsequent errors silent,
until/unless we actually get a request to create a mapping for the given
device.

 - igb driver detectes error on linkup and kdump via network fails.

That's a driver bug, IIRC. It was failing to completely reset the
hardware. It's fixed now, isn't it?

 - On a certain platform, though kdump itself works, PCIe error like
   Unexpected Completion is detected and it gets hardware degraded.

More information required.

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Neil Horman
On Thu, Apr 04, 2013 at 08:57:06AM -0600, Bjorn Helgaas wrote:
 On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman nhor...@tuxdriver.com wrote:
  On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote:
  On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote:
   );
+
+   if ((revision == 0x13)  irq_remapping_enabled) {
+pr_warn(HW_ERR This system BIOS has enabled 
interrupt remapping\n
+on a chipset that contains an errata making 
that\n
+feature unstable.  Please reboot with 
nointremap\n
+added to the kernel command line and 
contact\n
+your BIOS vendor for an update);
 
  This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'.
 
  Ok, copy that. I'll repost shortly
 
 When you do, please include URLs for any problem reports or bugzillas you 
 have.
 
Well, those are going to be vendor specific, so I'm not sure we can really do
that, at least not in any meaningful way.

 I assume Windows just works in this situation?
No more or less than linux does in this case.  The Intel provided errata
indicates that the only acceptable workaround is to disable remapping in the
BIOS, so I would presume that if a windows system has a BIOS that doesn't
implement this fix, its just as exposed as we are.
Neil

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-04 Thread Sethi Varun-B16395


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Thursday, April 04, 2013 8:52 PM
 To: Sethi Varun-B16395
 Cc: Joerg Roedel; Yoder Stuart-B08248; Wood Scott-B07421;
 iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; ga...@kernel.crashing.org;
 b...@kernel.crashing.org
 Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu
 implementation.
 
 On Thu, 2013-04-04 at 13:00 +, Sethi Varun-B16395 wrote:
 
   -Original Message-
   From: Alex Williamson [mailto:alex.william...@redhat.com]
   Sent: Wednesday, April 03, 2013 11:32 PM
   To: Joerg Roedel
   Cc: Sethi Varun-B16395; Yoder Stuart-B08248; Wood Scott-B07421;
   iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org;
   linux- ker...@vger.kernel.org; ga...@kernel.crashing.org;
   b...@kernel.crashing.org
   Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and
   iommu implementation.
  
   On Tue, 2013-04-02 at 18:18 +0200, Joerg Roedel wrote:
Cc'ing Alex Williamson
   
Alex, can you please review the iommu-group part of this patch?
  
   Sure, it looks pretty reasonable.  AIUI, all PCI devices are below
   some kind of host bridge that is either new and supports
   partitioning or old and doesn't.  I don't know if that's a
   visibility or isolation requirement, perhaps PCI ACS-ish.  In the
   new host bridge case, each device gets a group.  This seems not to
   have any quirks for multifunction devices though.  On AMD and Intel
   IOMMUs we test multifunction device ACS support to determine whether
   all the functions should be in the same group.  Is there any reason
 to trust multifunction devices on PAMU?
  
  [Sethi Varun-B16395] In the case where we can partition endpoints we
  can distinguish transactions based on the bus,device,function number
  combination. This support is available in the PCIe controller (host
  bridge).
 
 So can x86 IOMMUs, that's the visibility aspect of IOMMU groups.
 Visibility alone doesn't necessarily imply that a device is isolated
 though.  A multifunction PCI device that doesn't expose ACS support may
 not isolate functions from each other.  For example a peer-to-peer DMA
 between functions may not be translated by the upstream IOMMU.  IOMMU
 groups should encompass both visibility and isolation.
[Sethi Varun-B16395] We can isolate the DMA access to the host based on the to 
the pci bus,device,function number.
I thought that was enough to put devices in to separate iommu groups. This is a 
PCIe controller property which allows us to partition PCIe devices. But, what I 
can understand from your point is that we also need to consider isolation at 
PCIe device level as well. I will check for the case of multifunction devices.

 
   I also find it curious what happens to the iommu group of the host
   bridge.  In the partitionable case the host bridge group is removed,
   in the non-partitionable case the host bridge group becomes the
   group for the children, removing the host bridge.  It's unique to
   PAMU so far that these host bridges are even in an iommu group (x86
   only adds pci devices), but I don't see it as necessarily wrong
   leaving it in either scenario.  Does it solve some problem to remove
 them from the groups?
   Thanks,
  [Sethi Varun-B16395] The PCIe controller isn't a partitionable entity,
  it would always be owned by the host.
 
 Ownership of a device shouldn't play into the group context.  An IOMMU
 group should be defined by it's visibility and isolation from other
 devices.  Whether the PCIe controller is allowed to be handed to
 userspace is a question for VFIO.
[Sethi Varun-B16395] The problem is in the case, where we can't partition PCIe 
devices. PCIe devices share the same device group as the PCI controller. This 
becomes a problem while assigning the devices to the guest, as you are required 
to unbind all the PCIe devices including the controller from the host. PCIe 
controller can't be unbound from the host, so we simply delete the controller 
iommu_group.

-Varun
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] documentation: iommu: add description of ARM System MMU binding

2013-04-04 Thread Will Deacon
This patch adds a description of the device tree binding for the ARM
System MMU architecture.

Cc: Rob Herring robherri...@gmail.com
Cc: Andreas Herrmann andreas.herrm...@calxeda.com
Signed-off-by: Will Deacon will.dea...@arm.com
---

Hello,

The driver for this is still a WIP. Both Andreas and myself have prototype
code, but we're planning to merge that together to get something more
general. Deciding on the binding is a good first step.

All comments welcome,

Will

 .../devicetree/bindings/iommu/arm,smmu.txt | 61 ++
 1 file changed, 61 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu.txt

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
new file mode 100644
index 000..938325f
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -0,0 +1,61 @@
+* ARM System MMU Architecture Implementation
+
+ARM SoCs may contain an implementation of the ARM System Memory
+Management Unit Architecture, which can be used to provide 1 or 2 stages
+of address translation to bus masters external to the CPU.
+
+The SMMU may also raise interrupts in response to various fault
+conditions.
+
+** System MMU required properties:
+
+- compatible: Should be one of arm,smmu-v1 or arm,smmu-v2
+  depending on the version of the architecture
+  implemented.
+
+- reg   : Base address and size of the SMMU.
+
+- #global-interrupts : The number of global interrupts exposed by the
+   device.
+
+- interrupts: Interrupt list, with the first #global-irqs entries
+  corresponding to the global interrupts and any
+  following entries corresponding to context interrupts,
+  specified in order of their indexing by the SMMU.
+
+- mmu-masters   : A list of phandles to device nodes representing bus
+  masters for which the SMMU can provide a translation.
+
+- stream-ids: A list of 16-bit values corresponding to the StreamIDs
+  for the devices listed in the mmu-masters property.
+  This list must be same length as mmu-masters, so
+  masters with multiple stream-ids will have multiple
+  entries in mmu-masters.
+
+** System MMU optional properties:
+
+- smmu-parent   : When multiple SMMUs are chained together, this
+  property can be used to provide a phandle to the
+  parent SMMU (that is the next SMMU on the path going
+  from the mmu-masters towards memory) node for this
+  SMMU.
+
+Example:
+
+smmu {
+compatible = arm,smmu-v1;
+reg = 0xba5e 0x1;
+#global-interrupts = 2;
+interrupts = 0 32 4,
+ 0 33 4,
+ 0 34 4, /* This is the first context interrupt 
*/
+ 0 35 4,
+ 0 36 4,
+ 0 37 4;
+mmu-masters = dma0,
+  dma0,
+  dma1;
+stream-ids  = 0xd01d,
+  0xd01e,
+  0xd11c;
+};
-- 
1.8.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Bjorn Helgaas
On Thu, Apr 4, 2013 at 9:39 AM, Neil Horman nhor...@tuxdriver.com wrote:
 On Thu, Apr 04, 2013 at 08:57:06AM -0600, Bjorn Helgaas wrote:
 On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman nhor...@tuxdriver.com wrote:
  On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote:
  On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote:
   );
+
+   if ((revision == 0x13)  irq_remapping_enabled) {
+pr_warn(HW_ERR This system BIOS has enabled 
interrupt remapping\n
+on a chipset that contains an errata making 
that\n
+feature unstable.  Please reboot with 
nointremap\n
+added to the kernel command line and 
contact\n
+your BIOS vendor for an update);
 
  This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'.
 
  Ok, copy that. I'll repost shortly

 When you do, please include URLs for any problem reports or bugzillas you 
 have.

 Well, those are going to be vendor specific, so I'm not sure we can really do
 that, at least not in any meaningful way.

Sorry, I don't understand your point.  It's useful to know who
reported it (e.g., for future testers) and what happened and what
bugzillas it solved.  Of course it applies only to machines with this
chipset and certain BIOS revisions.

 I assume Windows just works in this situation?
 No more or less than linux does in this case.  The Intel provided errata
 indicates that the only acceptable workaround is to disable remapping in the
 BIOS, so I would presume that if a windows system has a BIOS that doesn't
 implement this fix, its just as exposed as we are.

It sounds like the effect of this bug is that on Linux, devices may
not work at all because of lost interrupts.  Either Windows must never
enable remapping (so it never sees the bug), or it must be designed in
a way that tolerates the problem.  I can't believe these machines
shipped with Windows certification if devices didn't work correctly.

Either way, I don't understand why we can't make the quirk just fix
this.  Booting with nointremap only sets disable_irq_remap, which is
only used by irq_remapping_supported().  Early quirks are run before
irq_remapping_supported () is ever called, so an early quirk ought to
be just as effective as the command line option.  Here's the relevant
call tree I see:

  start_kernel
setup_arch
  parse_early_param
  early_quirks
rest_init
  ...
first use of disable_irq_remap

The x86 setup_arch() does call generic_apic_probe(), but as far as I
can tell, none of the APIC .probe() methods reference
disable_irq_remap, so that doesn't look like a problem.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.

2013-04-04 Thread Bjorn Helgaas
On Thu, Mar 7, 2013 at 7:35 PM, Andrew Cooks aco...@gmail.com wrote:
 This patch creates a quirk to allow the Intel IOMMU to be enabled for devices
 that use incorrect tags during DMA. It is similar to the quirk for Ricoh
 devices, but allows mapping multiple functions and mapping of 'ghost'
 functions that do not correspond to real devices. Devices that need this
 include a variety of Marvell 88SE91xx based SATA controllers. [1][2]

 Changelog:
 v4: Process feedback received from Alex Williamson.
  * don't assume function 0 is a real device.
  * exit early if no ghost functions are known, or all known functions have
been mapped.
  * cleanup failure case so mapping succeeds or fails for all ghost functions
per device.
  * improve comments.

 v3:
  * Adopt David Woodhouse's terminology by referring to the quirky functions as
  'ghost' functions.
  * Unmap ghost functions when device is detached from IOMMU.
  * Stub function for when CONFIG_PCI_QUIRKS is not enabled.


  This patch was generated against 3.9-rc1, but will also apply to 3.7.10.

  Bug reports:
  1. https://bugzilla.redhat.com/show_bug.cgi?id=757166
  2. https://bugzilla.kernel.org/show_bug.cgi?id=42679

 Signed-off-by: Andrew Cooks aco...@gmail.com
 ---
  drivers/iommu/intel-iommu.c |   69 
 +++
  drivers/pci/quirks.c|   67 +-
  include/linux/pci.h |5 +++
  include/linux/pci_ids.h |1 +
  4 files changed, 141 insertions(+), 1 deletions(-)

I'm OK with the pci/quirks.c part of this, but the bulk of the
interesting code is in intel-iommu.c, so I assume the IOMMU folks will
take care of this.

Bjorn

 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index 0099667..f53f3e3 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -1674,6 +1674,69 @@ static int domain_context_mapping_one(struct 
 dmar_domain *domain, int segment,
 return 0;
  }

 +static void iommu_detach_dev(struct intel_iommu *iommu, u8 bus, u8 devfn);
 +
 +static void unmap_ghost_dma_fn(struct pci_dev *pdev, u8 fn_map)
 +{
 +   u8 fn;
 +   struct intel_iommu *iommu;
 +
 +   iommu = device_to_iommu(pci_domain_nr(pdev-bus), pdev-bus-number,
 +   pdev-devfn);
 +
 +   /* something must be seriously fubar if we can't lookup the iommu. */
 +   BUG_ON(!iommu);
 +
 +   for (fn = 0; fn = 7  fn_map  fn; fn++) {
 +   if (fn == PCI_FUNC(pdev-devfn))
 +   continue;
 +   if (fn_map  (1fn)) {
 +   iommu_detach_dev(iommu,
 +   pdev-bus-number,
 +   PCI_DEVFN(PCI_SLOT(pdev-devfn), fn));
 +   dev_dbg(pdev-dev, quirk; ghost func %d unmapped,
 +   fn);
 +   }
 +   }
 +}
 +
 +/* For quirky devices like Marvell 88SE91xx chips that use ghost functions. 
 */
 +static int map_ghost_dma_fn(struct dmar_domain *domain,
 +   struct pci_dev *pdev,
 +   int translation)
 +{
 +   u8 fn, fn_map;
 +   u8 fn_mapped = 0;
 +   int err = 0;
 +
 +   fn_map = pci_get_dma_source_map(pdev);
 +
 +   /* this is the common, non-quirky case. */
 +   if (!fn_map)
 +   return 0;
 +
 +   for (fn = 0; fn = 7  fn_map  fn; fn++) {
 +   if (fn == PCI_FUNC(pdev-devfn))
 +   continue;
 +   if (fn_map  (1fn)) {
 +   err = domain_context_mapping_one(domain,
 +   pci_domain_nr(pdev-bus),
 +   pdev-bus-number,
 +   PCI_DEVFN(PCI_SLOT(pdev-devfn), fn),
 +   translation);
 +   if (err) {
 +   dev_err(pdev-dev,
 +   mapping ghost func %d failed, fn);
 +   unmap_ghost_dma_fn(pdev, fn_mapped);
 +   return err;
 +   }
 +   dev_dbg(pdev-dev, quirk; ghost func %d mapped, 
 fn);
 +   fn_mapped |= (1fn);
 +   }
 +   }
 +   return 0;
 +}
 +
  static int
  domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
 int translation)
 @@ -1687,6 +1750,11 @@ domain_context_mapping(struct dmar_domain *domain, 
 struct pci_dev *pdev,
 if (ret)
 return ret;

 +   /* quirk for undeclared/ghost pci functions */
 +   ret = map_ghost_dma_fn(domain, pdev, translation);
 +   if (ret)
 +   return ret;
 +
 /* dependent device mapping */
 tmp = pci_find_upstream_pcie_bridge(pdev);
 if (!tmp)
 @@ -3786,6 +3854,7 @@ static void domain_remove_one_dev_info(struct 

Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Bjorn Helgaas
On Thu, Apr 4, 2013 at 11:51 AM, Neil Horman nhor...@tuxdriver.com wrote:

 Oh, you want the bug report that I'm fixing this against?  Sure, I can do 
 that.
 I thought you wanted me to include a url in the WARN_TAINT, with which user
 could report occurances of this bug.  Yeah, the bug that this is reported in 
 is:
 https://bugzilla.redhat.com/show_bug.cgi?id=887006

 Its standing in for about a dozen or so variants of this issue we've seen

Exactly -- I'm just hoping for something in the changelog.  BTW, this
particular bugzilla is not public.

 Regardless, theres also the security issue to consider here - namely that
 disabling irq remapping opens up users of virt to a possible security bug
 (potential irq injection).  Some users may wish to live with the remapping
 error, given that error typically leads to devices that need to be
 restarted/reset to start working again, rather than live with the security 
 hole.
 I rather like the warning, that gives users a choice, but I'll spin up a 
 version
 that just disables it if you would rather.

I don't believe users will want to make a choice like that or even be
sophisticated enough to do it, at least not based on something in
dmesg.  I'm pretty sure I'm not  :)

The only supportable thing I can imagine doing would be:

  - Disable interrupt remapping if this chipset defect is present, so
devices work reliably (they don't need whatever restart/reset you
referred to above).
  - Disable virt functionality when interrupt remapping is disabled to
avoid the security problem (I don't know the details of this.)
  - Add a command-line option to enable interrupt remapping (I think
intremap=on is currently parsed too early, but maybe this could be
reworked so the option could override the quirk disable).
  - Add release notes saying boot with 'intremap=on' if you want the
virt functionality and can accept unreliable devices.

That way the default behavior is safe and reliable (though perhaps
lacking some functionality), and you have told the user a way to get
safe and unreliable operation if he's willing to accept that.  At
least, that's what I think I would want if I were in RH's shoes.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio API changes needed for powerpc (v2)

2013-04-04 Thread Alex Williamson
On Thu, 2013-04-04 at 17:32 +, Yoder Stuart-B08248 wrote:
 Based on the email thread over the last couple of days, I have
 below an more concrete proposal (v2) for new ioctls supporting vfio-pci
 on SoCs with the Freescale PAMU.
 
 Example usage is as described by Scott:
 
 count = VFIO_IOMMU_GET_MSI_BANK_COUNT
 VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY)
 VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS)
 // do other DMA maps now, or later, or not at all, doesn't matter
 for (i = 0; i  count; i++)
 VFIO_IOMMU_MAP_MSI_BANK(iova, i);
 // The kernel now knows where each bank has been mapped, and can
 //   update PCI config space appropriately.
 
 Thanks,
 Stuart
 
 
 
 The Freescale PAMU is an aperture-based IOMMU with the following 
 characteristics.  Each device has an entry in a table in memory
 describing the iova-phys mapping. The mapping has:
-an overall aperture that is power of 2 sized, and has a start iova that
 is naturally aligned
-has 1 or more windows within the aperture
   -number of windows must be power of 2, max is 256
   -size of each window is determined by aperture size / # of windows
   -iova of each window is determined by aperture start iova / # of windows
   -the mapped region in each window can be different than
the window size...mapping must power of 2 
   -physical address of the mapping must be naturally aligned
with the mapping size
 
 /*
  * VFIO_PAMU_GET_ATTR
  *
  * Gets the iommu attributes for the current vfio container.  This 
  * ioctl is applicable to an iommu type of VFIO_PAMU only.
  * Caller sets argsz and attribute.  The ioctl fills in
  * the provided struct vfio_pamu_attr based on the attribute
  * value that was set.
  * Operates on VFIO file descriptor (/dev/vfio/vfio).
  * Return: 0 on success, -errno on failure
  */
 struct vfio_pamu_attr {
 __u32   argsz;
 __u32   attribute;
 #define VFIO_ATTR_GEOMETRY 1
 #define VFIO_ATTR_WINDOWS  2
 #define VFIO_ATTR_PAMU_STASH   3
 
 /* fowlling fields are for VFIO_ATTR_GEOMETRY */
 __u64 aperture_start; /* first address that can be mapped*/
 __u64 aperture_end;   /* last address that can be mapped */
 
 /* follwing fields are for VFIO_ATTR_WINDOWS */
 __u32 windows;/* number of windows in the aperture */
   /* initially this will be the max number
* of windows that can be set
*/
 
 /* following fields are for VFIO_ATTR_PAMU_STASH */
 __u32 cpu;/* CPU number for stashing */
 __u32 cache;  /* cache ID for stashing */

Can multiple attribute bits be set at once?  If not then the above could
be a union and the attribute could be an enum.  A flags field is
probably still a good idea.

 };
 #define VFIO_PAMU_GET_ATTR  _IO(VFIO_TYPE, VFIO_BASE + x,
 struct vfio_pamu_attr)
 
 /*
  * VFIO_PAMU_SET_ATTR
  *
  * Sets the iommu attributes for the current vfio container.  This 
  * ioctl is applicable to an iommu type of VFIO_PAMU only.
  * Caller sets struct vfio_pamu attr, including argsz and attribute and
  * setting any fields that are valid for the attribute.
  * Operates on VFIO file descriptor (/dev/vfio/vfio).
  * Return: 0 on success, -errno on failure
  */
 #define VFIO_PAMU_SET_ATTR  _IO(VFIO_TYPE, VFIO_BASE + x,
 struct vfio_pamu_attr)
 
 /*
  * VFIO_PAMU_GET_MSI_BANK_COUNT
  *
  * Returns the number of MSI banks for this platform.  This tells user space
  * how many aperture windows should be reserved for MSI banks when setting
  * the PAMU geometry and window count.
  * Fills in provided struct vfio_pamu_msi_banks. Caller sets argsz. 
  * Operates on VFIO file descriptor (/dev/vfio/vfio).
  * Return: 0 on success, -errno on failure
  */
 struct vfio_pamu_msi_banks {
 __u32   argsz;
 __u32   bank_count;  /* the number of MSI
 };
 #define VFIO_PAMU_GET_MSI_BANK_COUNT  _IO(VFIO_TYPE, VFIO_BASE + x,
 struct vfio_pamu_msi_banks)

argsz w/o flags doesn't really buy us much.

 
 /*
  * VFIO_PAMU_MAP_MSI_BANK
  *
  * Maps the MSI bank at the specified index and iova.  User space must
  * call this ioctl once for each MSI bank (count of banks is returned by
  * VFIO_IOMMU_GET_MSI_BANK_COUNT).
  * Caller provides struct vfio_pamu_msi_bank_map with all fields set.
  * Operates on VFIO file descriptor (/dev/vfio/vfio).
  * Return: 0 on success, -errno on failure
  */
 
 struct vfio_pamu_msi_bank_map {
 __u32   argsz;
 __u32   msi_bank_index;  /* the index of the MSI bank */
 __u64   iova;  /* the iova the bank is to be mapped to */
 };

Again, flags.  If we dynamically add or remove devices from a container
the bank count can change, right?  If bank count goes from 2 to 3, does
userspace know assume the new bank is [2]?  If bank count goes from 3 

RFC: vfio API changes needed for powerpc (v3)

2013-04-04 Thread Yoder Stuart-B08248
-v3 updates
   -made vfio_pamu_attr a union, added flags
   -s/VFIO_PAMU_/VFIO_IOMMU_PAMU_/ for the ioctls to make it more
clear which fd is being operated on
   -added flags to vfio_pamu_msi_bank_map/umap
   -VFIO_PAMU_GET_MSI_BANK_COUNT now just returns a __u32
not a struct
   -fixed some typos



The Freescale PAMU is an aperture-based IOMMU with the following
characteristics.  Each device has an entry in a table in memory
describing the iova-phys mapping. The mapping has:
   -an overall aperture that is power of 2 sized, and has a start iova that
is naturally aligned
   -has 1 or more windows within the aperture
  -number of windows must be power of 2, max is 256
  -size of each window is determined by aperture size / # of windows
  -iova of each window is determined by aperture start iova / # of windows
  -the mapped region in each window can be different than
   the window size...mapping must power of 2
  -physical address of the mapping must be naturally aligned
   with the mapping size

These ioctls operate on the VFIO file descriptor (/dev/vfio/vfio).

/*
 * VFIO_IOMMU_PAMU_GET_ATTR
 *
 * Gets the iommu attributes for the current vfio container.  This
 * ioctl is applicable to an iommu type of VFIO_PAMU only.
 * Caller sets argsz and attribute.  The ioctl fills in
 * the provided struct vfio_pamu_attr based on the attribute
 * value that was set.

 * Return: 0 on success, -errno on failure
 */
struct vfio_pamu_attr {
__u32   argsz;
__u32   flags;/* no flags currently */
__u32   attribute;

union {
/* VFIO_ATTR_GEOMETRY */
struct { 
__u64 aperture_start; /* first addr that can be mapped 
*/
__u64 aperture_end;  /* last addr that can be mapped */
} attr;

/* VFIO_ATTR_WINDOWS */
__u32 windows;  /* number of windows in the aperture */
/* initially this will be the max number
 * of windows that can be set
 */

/* VFIO_ATTR_PAMU_STASH */
struct { 
__u32 cpu; /* CPU number for stashing */
__u32 cache;   /* cache ID for stashing */
} stash;
}
};
#define VFIO_IOMMU_PAMU_GET_ATTR  _IO(VFIO_TYPE, VFIO_BASE + x,
struct vfio_pamu_attr)

/*
 * VFIO_IOMMU_PAMU_SET_ATTR
 *
 * Sets the iommu attributes for the current vfio container.  This
 * ioctl is applicable to an iommu type of VFIO_PAMU only.
 * Caller sets struct vfio_pamu attr, including argsz and attribute and
 * setting any fields that are valid for the attribute.
 * Return: 0 on success, -errno on failure
 */
#define VFIO_IOMMU_PAMU_SET_ATTR  _IO(VFIO_TYPE, VFIO_BASE + x,
struct vfio_pamu_attr)

/*
 * VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT
 *
 * Returns the number of MSI banks for this platform.  This tells user space
 * how many aperture windows should be reserved for MSI banks when setting
 * the PAMU geometry and window count.
 * Return: __u32 bank count on success, -errno on failure
 */
#define VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT _IO(VFIO_TYPE, VFIO_BASE + x, __u32)

/*
 * VFIO_IOMMU_PAMU_MAP_MSI_BANK
 *
 * Maps the MSI bank at the specified index and iova.  User space must
 * call this ioctl once for each MSI bank (count of banks is returned by
 * VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT).
 * Caller provides struct vfio_pamu_msi_bank_map with all fields set.
 * Return: 0 on success, -errno on failure
 */

struct vfio_pamu_msi_bank_map {
__u32   argsz;
__u32   flags; /* no flags currently */
__u32   msi_bank_index;  /* the index of the MSI bank */
__u64   iova;  /* the iova the bank is to be mapped to */
};
#define VFIO_IOMMU_PAMU_MAP_MSI_BANK  _IO(VFIO_TYPE, VFIO_BASE + x,
struct vfio_pamu_msi_bank_map )

/*
 * VFIO_IOMMU_PAMU_UNMAP_MSI_BANK
 *
 * Unmaps the MSI bank at the specified iova.
 * Caller provides struct vfio_pamu_msi_bank_unmap with all fields set.
 * Operates on VFIO file descriptor (/dev/vfio/vfio).
 * Return: 0 on success, -errno on failure
 */

struct vfio_pamu_msi_bank_unmap {
__u32   argsz;
__u32   flags; /* no flags currently */
__u64   iova;  /* the iova to be unmapped to */
};
#define VFIO_IOMMU_PAMU_UNMAP_MSI_BANK  _IO(VFIO_TYPE, VFIO_BASE + x,
struct vfio_pamu_msi_bank_unmap )

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: RFC: vfio API changes needed for powerpc (v2)

2013-04-04 Thread Yoder Stuart-B08248


 -Original Message-
 From: Wood Scott-B07421
 Sent: Thursday, April 04, 2013 5:52 PM
 To: Yoder Stuart-B08248
 Cc: Alex Williamson; Wood Scott-B07421; ag...@suse.de; Bhushan Bharat-R65777; 
 Sethi Varun-B16395;
 k...@vger.kernel.org; qemu-de...@nongnu.org; iommu@lists.linux-foundation.org
 Subject: Re: RFC: vfio API changes needed for powerpc (v2)
 
 On 04/04/2013 04:38:44 PM, Yoder Stuart-B08248 wrote:
 
/*
 * VFIO_PAMU_MAP_MSI_BANK
 *
 * Maps the MSI bank at the specified index and iova.  User space
  must
 * call this ioctl once for each MSI bank (count of banks is
  returned by
 * VFIO_IOMMU_GET_MSI_BANK_COUNT).
 * Caller provides struct vfio_pamu_msi_bank_map with all fields
  set.
 * Operates on VFIO file descriptor (/dev/vfio/vfio).
 * Return: 0 on success, -errno on failure
 */
   
struct vfio_pamu_msi_bank_map {
__u32   argsz;
__u32   msi_bank_index;  /* the index of the MSI bank */
__u64   iova;  /* the iova the bank is to be mapped
  to */
};
  
   Again, flags.  If we dynamically add or remove devices from a
  container
   the bank count can change, right?  If bank count goes from 2 to 3,
  does
   userspace know assume the new bank is [2]?  If bank count goes from
  3 to
   2, which index was that?  If removing a device removes an MSI bank
  then
   vfio-pamu will automatically do the unmap, right?
 
  My assumption is that the bank count returned by
  VFIO_IOMMU_GET_MSI_BANK_COUNT
  is the max bank count for a platform.  (number will mostly likely
  always be
  3 or 4).  So it won't change as devices are added or removed.
 
 It should be the actual number of banks used.  This is required if
 we're going to have userspace do the iteration and specify the exact
 iovas to use -- and even if we aren't going to do that, it would be
 more restrictive on available iova-space than is necessary.  Usually
 there will be only one bank in the group.
 
 Actually mapping all of the MSI banks, all the time, would completely
 negate the point of using the separate alias pages.

The geometry, windows, DMA mappings, etc is set on a 'container'
which may have multiple groups in it.  So user space needs to
determine the total number of MSI windows needed when setting
the geometry and window count.

In the flow you proposed:

count = VFIO_IOMMU_GET_MSI_BANK_COUNT
VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY)
VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS)
// do other DMA maps now, or later, or not at all, doesn't matter
for (i = 0; i  count; i++)
VFIO_IOMMU_MAP_MSI_BANK(iova, i);

...that count has to be the total, not the count for 1 of
N possible groups.   So the get count ioctl is not done on
a group.

However, like you pointed out we don't want to negate isolation 
of the separate alias pages.  All this API is doing is telling
the kernel which windows to use for which MSI banks.   It's up
to the kernel to actually enable them as needed.

Say 3 MSI banks exist.  If there are no groups added to the
vfio container and all 3 MAP_MSI_BANK calls occurred
the picture may look like this (based on my earlier example):

   win gphys/
#   enabled iovaphys  size
--- ---   
5 N 0x1000  0xf_fe044000  4KB// msi bank 1
6 N 0x1400  0xf_fe045000  4KB// msi bank 2
7 N 0x1800  0xf_fe046000  4KB// msi bank 3

User space adds 2 groups that use bank 1:

   win  gphys/
#   enabled iovaphys  size
--- ---   
5 Y 0x1000  0xf_fe044000  4KB// msi bank 1
6 N 0x1400  0xf_fe045000  4KB// msi bank 2
7 N 0x1800  0xf_fe046000  4KB// msi bank 3

User space adds another group that uses bank 3:

   win  gphys/
#   enabled iovaphys  size
--- ---   
5 Y 0x1000  0xf_fe044000  4KB// msi bank 1
6 N 0x1400  0xf_fe045000  4KB// msi bank 2
7 Y 0x1800  0xf_fe046000  4KB// msi bank 3

User space doesn't need to care what is actually enabled,
it just needs to the the kernel which windows to use
and the kernel can take care of the rest.

Stuart


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-04 Thread Sethi Varun-B16395


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Thursday, April 04, 2013 10:14 PM
 To: Sethi Varun-B16395
 Cc: Joerg Roedel; Yoder Stuart-B08248; Wood Scott-B07421;
 iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; ga...@kernel.crashing.org;
 b...@kernel.crashing.org
 Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu
 implementation.
 
 On Thu, 2013-04-04 at 16:35 +, Sethi Varun-B16395 wrote:
 
   -Original Message-
   From: Alex Williamson [mailto:alex.william...@redhat.com]
   Sent: Thursday, April 04, 2013 8:52 PM
   To: Sethi Varun-B16395
   Cc: Joerg Roedel; Yoder Stuart-B08248; Wood Scott-B07421;
   iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org;
   linux- ker...@vger.kernel.org; ga...@kernel.crashing.org;
   b...@kernel.crashing.org
   Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and
   iommu implementation.
  
   On Thu, 2013-04-04 at 13:00 +, Sethi Varun-B16395 wrote:
   
 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, April 03, 2013 11:32 PM
 To: Joerg Roedel
 Cc: Sethi Varun-B16395; Yoder Stuart-B08248; Wood Scott-B07421;
 iommu@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org;
 linux- ker...@vger.kernel.org; ga...@kernel.crashing.org;
 b...@kernel.crashing.org
 Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver
 and iommu implementation.

 On Tue, 2013-04-02 at 18:18 +0200, Joerg Roedel wrote:
  Cc'ing Alex Williamson
 
  Alex, can you please review the iommu-group part of this patch?

 Sure, it looks pretty reasonable.  AIUI, all PCI devices are
 below some kind of host bridge that is either new and supports
 partitioning or old and doesn't.  I don't know if that's a
 visibility or isolation requirement, perhaps PCI ACS-ish.  In
 the new host bridge case, each device gets a group.  This seems
 not to have any quirks for multifunction devices though.  On AMD
 and Intel IOMMUs we test multifunction device ACS support to
 determine whether all the functions should be in the same group.
 Is there any reason
   to trust multifunction devices on PAMU?

[Sethi Varun-B16395] In the case where we can partition endpoints
we can distinguish transactions based on the bus,device,function
number combination. This support is available in the PCIe
controller (host bridge).
  
   So can x86 IOMMUs, that's the visibility aspect of IOMMU groups.
   Visibility alone doesn't necessarily imply that a device is isolated
   though.  A multifunction PCI device that doesn't expose ACS support
   may not isolate functions from each other.  For example a
   peer-to-peer DMA between functions may not be translated by the
   upstream IOMMU.  IOMMU groups should encompass both visibility and
 isolation.
  [Sethi Varun-B16395] We can isolate the DMA access to the host based
  on the to the pci bus,device,function number.
 
 The IOMMU can only isolate DMA that it can see.  A multifunction device
 may never expose peer-to-peer DMA to the upstream device, it's
 implementation specific.  The ACS flags allow that possibility to be
 controlled and prevented.
 
  I thought that was enough to put devices in to separate iommu groups.
  This is a PCIe controller property which allows us to partition PCIe
  devices. But, what I can understand from your point is that we also
  need to consider isolation at PCIe device level as well. I will check
  for the case of multifunction devices.
 
  
 I also find it curious what happens to the iommu group of the
 host bridge.  In the partitionable case the host bridge group is
 removed, in the non-partitionable case the host bridge group
 becomes the group for the children, removing the host bridge.
 It's unique to PAMU so far that these host bridges are even in
 an iommu group (x86 only adds pci devices), but I don't see it
 as necessarily wrong leaving it in either scenario.  Does it
 solve some problem to remove
   them from the groups?
 Thanks,
[Sethi Varun-B16395] The PCIe controller isn't a partitionable
entity, it would always be owned by the host.
  
   Ownership of a device shouldn't play into the group context.  An
   IOMMU group should be defined by it's visibility and isolation from
   other devices.  Whether the PCIe controller is allowed to be handed
   to userspace is a question for VFIO.
  [Sethi Varun-B16395] The problem is in the case, where we can't
  partition PCIe devices. PCIe devices share the same device group as
  the PCI controller. This becomes a problem while assigning the devices
  to the guest, as you are required to unbind all the PCIe devices
  including the controller from the host. PCIe controller can't be
  unbound from the host, so we simply delete the 

Ricoh DMAR bug returns? (WAS Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.)

2013-04-04 Thread Andrew Cooks
On Tue, Apr 2, 2013 at 11:47 PM, Pat Erley pat-l...@erley.org wrote:
 On 04/02/2013 10:50 AM, Andrew Cooks wrote:

 On 2 Apr 2013 15:37, Pat Erley pat-l...@erley.org
 mailto:pat-l...@erley.org wrote:
  
   On 03/07/2013 09:35 PM, Andrew Cooks wrote:
  
   --- a/drivers/pci/quirks.c
   +++ b/drivers/pci/quirks.c
  
   +/* Table of multiple (ghost) source functions. This is similar to the
   + * translated sources above, but with the following differences:
   + * 1. the device may use multiple functions as DMA sources,
   + * 2. these functions cannot be assumed to be actual devices,
 they're simply
   + * incorrect DMA tags.
   + * 3. the specific ghost function for a request can not always be
 predicted.
   + * For example, the actual device could be xx:yy.1 and it could use
   + * both 0 and 1 for different requests, with no obvious way to tell
 when
   + * DMA will be tagged as comming from xx.yy.0 and and when it will
 be tagged
   + * as comming from xx.yy.1.
   + * The bitmap contains all of the functions used in DMA tags,
 including the
   + * actual device.
   + * See https://bugzilla.redhat.com/show_bug.cgi?id=757166,
   + * https://bugzilla.kernel.org/show_bug.cgi?id=42679
   + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768
   + */
   +static const struct pci_dev_dma_multi_func_sources {
   +   u16 vendor;
   +   u16 device;
   +   u8 func_map;/* bit map. lsb is fn 0. */
   +} pci_dev_dma_multi_func_sources[] = {
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)},
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)},
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)},
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)},
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)},
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)},
   +   { 0 }
   +};
  
  
   Adding another buggy device.  I have a Ricoh multifunction device:
  
   17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev
 01)
   17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394
   Controller (rev 01)
  
   17:00.0 0805: 1180:e822 (rev 01)
   17:00.3 0c00: 1180:e832 (rev 01)
  

 The Ricoh device issue has been known for some time and a quirk has been
 available since commit 12ea6cad1c7d046 in June 2012.  It's slightly
 different than the problem this patch tries to work around [1].


 Hmm, I've had this problem with many recent (vanilla) kernels, up to and
 including 3.9-rc5


   that adding entries for also fixed booting.  I don't have any SD
 cards or firewire devices handy to test that they work, but the system
 now boots, which was not the case without your patch and IOMMU/DMAR
 enabled.

 That is really strange. Could you tell us what kernel version you tested
 and provide dmesg output?


 I'll capture a vanilla 3.8.5 boot without any patches and iommu=off, then
 try to find another machine to catch what I can of a netconsole boot with
 iommu=on.  What's the preferred way to send these?  pastebin links?

 I'd been running the 'dirty' fix that's in the redhat bugzilla entry.  I
 checked my .config and have CONFIG_PCI_QUIRKS=y, and verified my devices are
 in the quirks table for the pci_func_0_dma_source fixup.

Do you mean that even though your hardware is specifically listed in
the quirk table, the quirk simply hasn't worked for you? That would be
unfortunate, to say the least.

The fedora kernel included a separate patch for this issue until
recently (see https://bugzilla.redhat.com/show_bug.cgi?id=880051).  It
basically just disabled DMAR when the Ricoh device is present, the
same as the patch to the mailing list you mentioned.

Is the dirty fix you're referring to comment 7?
https://bugzilla.redhat.com/show_bug.cgi?id=605888#c7


 [1] In the Ricoh case, multiple functions are used for real devices and
 the bug is that these devices all use function 0 during DMA. In this
 particular case, I'd expect the FireWire device 17:00.3 to issue DMA
 from the SD Host Controller address 17:00.0. The quirk is not too much
 of a terrible hack - it's a fairly simple translation.

 In the Marvell case, the real device uses DMA source tags that don't
 actually belong to any visible devices. The quirk to make this work is
 more invasive, not nearly as elegant and has not attracted much
 enthusiasm from subsystem maintainers, though I'm still hopeful that a
 quirk will be merged in some form or another.


 Thanks for explaining the difference!

 Pat
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Ricoh DMAR bug returns? (WAS Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.)

2013-04-04 Thread Pat Erley

On 04/05/2013 12:44 AM, Andrew Cooks wrote:

On Tue, Apr 2, 2013 at 11:47 PM, Pat Erley pat-l...@erley.org wrote:

On 04/02/2013 10:50 AM, Andrew Cooks wrote:


On 2 Apr 2013 15:37, Pat Erley pat-l...@erley.org
mailto:pat-l...@erley.org wrote:
  
   On 03/07/2013 09:35 PM, Andrew Cooks wrote:
  
   --- a/drivers/pci/quirks.c
   +++ b/drivers/pci/quirks.c
  
   +/* Table of multiple (ghost) source functions. This is similar to the
   + * translated sources above, but with the following differences:
   + * 1. the device may use multiple functions as DMA sources,
   + * 2. these functions cannot be assumed to be actual devices,
they're simply
   + * incorrect DMA tags.
   + * 3. the specific ghost function for a request can not always be
predicted.
   + * For example, the actual device could be xx:yy.1 and it could use
   + * both 0 and 1 for different requests, with no obvious way to tell
when
   + * DMA will be tagged as comming from xx.yy.0 and and when it will
be tagged
   + * as comming from xx.yy.1.
   + * The bitmap contains all of the functions used in DMA tags,
including the
   + * actual device.
   + * See https://bugzilla.redhat.com/show_bug.cgi?id=757166,
   + * https://bugzilla.kernel.org/show_bug.cgi?id=42679
   + * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1089768
   + */
   +static const struct pci_dev_dma_multi_func_sources {
   +   u16 vendor;
   +   u16 device;
   +   u8 func_map;/* bit map. lsb is fn 0. */
   +} pci_dev_dma_multi_func_sources[] = {
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9123, (10)|(11)},
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9125, (10)|(11)},
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9128, (10)|(11)},
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9130, (10)|(11)},
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9143, (10)|(11)},
   +   { PCI_VENDOR_ID_MARVELL_2, 0x9172, (10)|(11)},
   +   { 0 }
   +};
  
  
   Adding another buggy device.  I have a Ricoh multifunction device:
  
   17:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev
01)
   17:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394
   Controller (rev 01)
  
   17:00.0 0805: 1180:e822 (rev 01)
   17:00.3 0c00: 1180:e832 (rev 01)
  

The Ricoh device issue has been known for some time and a quirk has been
available since commit 12ea6cad1c7d046 in June 2012.  It's slightly
different than the problem this patch tries to work around [1].



Hmm, I've had this problem with many recent (vanilla) kernels, up to and
including 3.9-rc5



   that adding entries for also fixed booting.  I don't have any SD
cards or firewire devices handy to test that they work, but the system
now boots, which was not the case without your patch and IOMMU/DMAR
enabled.

That is really strange. Could you tell us what kernel version you tested
and provide dmesg output?



I'll capture a vanilla 3.8.5 boot without any patches and iommu=off, then
try to find another machine to catch what I can of a netconsole boot with
iommu=on.  What's the preferred way to send these?  pastebin links?

I'd been running the 'dirty' fix that's in the redhat bugzilla entry.  I
checked my .config and have CONFIG_PCI_QUIRKS=y, and verified my devices are
in the quirks table for the pci_func_0_dma_source fixup.


Do you mean that even though your hardware is specifically listed in
the quirk table, the quirk simply hasn't worked for you? That would be
unfortunate, to say the least.


Precisely.


The fedora kernel included a separate patch for this issue until
recently (see https://bugzilla.redhat.com/show_bug.cgi?id=880051).  It
basically just disabled DMAR when the Ricoh device is present, the
same as the patch to the mailing list you mentioned.


Yes, that's what I've been avoiding doing.  Every new release, I boot 
once with iommu=on, and firewire blacklisted, boot up, load the firewire 
driver.  This has caused the 'Ricoh DMAR' bug on every kernel since I 
got the laptop.  I then reboot and 



Is the dirty fix you're referring to comment 7?
https://bugzilla.redhat.com/show_bug.cgi?id=605888#c7


Apply this patch, which has worked fine for me, but per a commend in a 
thread I created here on 10/19/2012[1], this has a potential significant 
performance impact.  In my use case, a performance hit is worth the cost 
for the features.


However, your patch(while not intended to be the fix), actually solves 
the issue on my machine.  I don't know if it also has the potential 
performance impact, but it's certainly not noticeably worse in my use case.


Pat Erley

[1] http://marc.info/?l=linux-pcim=135094489232548w=2
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu