Re: [PATCH] documentation: iommu: add description of ARM System MMU binding

2013-04-24 Thread Will Deacon
On Tue, Apr 23, 2013 at 11:54:53PM +0100, Olav Haugan wrote:
 Hi Will,

Hello again,

 On 4/18/2013 12:01 PM, Will Deacon wrote:
  No. The device-tree describes the *hardware*, as per usual. The StreamIDs
  are fixed properties of the SoC and we can't change them from Linux, so we
  describe all of the StreamIDs upstream of each SMMU so that we can program
  the thing. There's no way to generic way to discover them.
 
 I meant that you are putting phandles to bus masters in the device tree
 that use/need the SMMU for VA2PA translation. Doesn't this put a
 dependency on the bus master devices? How will this work? Lets say that
 we have a device during bootup that needs to use the SMMU (such as a
 display processor [DP]). Don't you have cyclic dependency? The SMMU
 device needs the DP device (phandle) but the DP device needs the SMMU to
 initialize its device driver?

That's a problem you have with the driver model anyway. The SMMU must be
registered on the same bus as the device before it can be used and the
devices must be added to that bus before they can be attached to a domain.

Getting phandles has no dependencies on anything -- the only dependency is
that the device is added to the bus on which the SMMU sits, just like every
other IOMMU driver.

  Why would a page table shared between devices require multiple context
  banks? Multiple SMRs and S2CRs, sure, but they would ultimately point at the
  same context bank (and hence same address space).
 
 What if you want to have 2 different VMID's being generated? Also, what
 about TLB management? If I have two context banks I can invalidate only
 entries associated with 1 of the context banks (if VMID differ for the
 two context banks).

Huh? We allocate one VMID per domain. If you want more VMIDs, use more
domains. TLB invalidation is per-domain, so there's no issue there.

  If a master needs to be in two address spaces at once, then it will need to
  attach it's StreamIDs to different domains. You can't place a single
  StreamID in two address spaces (this is an architectural constraint).
 
 Yes, you would have a separate domain. I am just wondering how I would
 model this in DT using the bindings that you are proposing? How does it
 work? The bindings specify bus masters to StreamIDs. So if I call attach
 with the master device you will allocate a context bank and program the
 StreamIDs as specified in the DT. So now if I want to have another
 context bank associated with the same master device what do I do? I call
 into attach with a new domain but with the same master device but the
 master device is already attached to a context/domain.

Why would you want to place a StreamID into two domains? That doesn't make
any sense and isn't even supported by the architecture (things like
conflicting SMR entries may not even be reported).

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2 V2] iommu/amd: Add workaround for ERBT1312

2013-04-24 Thread Joerg Roedel
On Tue, Apr 23, 2013 at 09:22:45AM -0400, Don Dutile wrote:
 Given other threads on this mail list (and I've seen crashes with same 
 problem)
 where this type of logging during a flood of IOMMU errors will lock up the 
 machine,
 is there something that can be done to break the do-while loop after n 
 iterations
 have been exec'd, so the kernel can progress during a crash ?

In the case of an IOMMU error flood this loop will only run until the
event-log/ppr-log overflows. So it should not turn into an endless loop.


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3 v14] iommu/fsl: Add additional iommu attributes required by the PAMU driver.

2013-04-24 Thread Joerg Roedel
On Tue, Apr 23, 2013 at 02:10:25PM +, Sethi Varun-B16395 wrote:
 I think it's fine to have the header under linux, actually I also the
 intel-iommu header under linux.

Yes, the difference is that VT-d runs on x86 and on ia64. So there is no
single arch where the header could be placed. The amd-iommu.h file on
the other hand is x86 only and should also be moved to asm/, as I just
found out :)

And as long as PAMU is only needed on a single architecture the header
should also be arch-specific. If that changes someday the header can be
moved to a generic place.


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2 v15] iommu/fsl: Add additional iommu attributes required by the PAMU driver.

2013-04-24 Thread Varun Sethi
Added the following domain attributes for the FSL PAMU driver:
1. Added new iommu stash attribute, which allows setting of the
   LIODN specific stash id parameter through IOMMU API.
2. Added an attribute for enabling/disabling DMA to a particular
   memory window.
3. Added domain attribute to check for PAMUV1 specific constraints.

Signed-off-by: Varun Sethi varun.se...@freescale.com
---
v15 changes:
- Moved fsl_pamu_stash.h under arch/powerpc/include/asm.
v14 changes:
- Add FSL prefix to PAMU attributes.
v13 changes:
- created a new file include/linux/fsl_pamu_stash.h for stash
attributes.
v12 changes:
- Moved PAMU specifc stash ids and structures to PAMU header file.
- no change in v11.
- no change in v10.
 arch/powerpc/include/asm/fsl_pamu_stash.h |   39 +
 include/linux/iommu.h |   16 
 2 files changed, 55 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/fsl_pamu_stash.h

diff --git a/arch/powerpc/include/asm/fsl_pamu_stash.h 
b/arch/powerpc/include/asm/fsl_pamu_stash.h
new file mode 100644
index 000..caa1b21
--- /dev/null
+++ b/arch/powerpc/include/asm/fsl_pamu_stash.h
@@ -0,0 +1,39 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) 2013 Freescale Semiconductor, Inc.
+ *
+ */
+
+#ifndef __FSL_PAMU_STASH_H
+#define __FSL_PAMU_STASH_H
+
+/* cache stash targets */
+enum pamu_stash_target {
+   PAMU_ATTR_CACHE_L1 = 1,
+   PAMU_ATTR_CACHE_L2,
+   PAMU_ATTR_CACHE_L3,
+};
+
+/*
+ * This attribute allows configuring stashig specific parameters
+ * in the PAMU hardware.
+ */
+
+struct pamu_stash_attribute {
+   u32 cpu;/* cpu number */
+   u32 cache;  /* cache to stash to: L1,L2,L3 */
+};
+
+#endif  /* __FSL_PAMU_STASH_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2727810..313d17a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -57,10 +57,26 @@ struct iommu_domain {
 #define IOMMU_CAP_CACHE_COHERENCY  0x1
 #define IOMMU_CAP_INTR_REMAP   0x2 /* isolates device intrs */
 
+/*
+ * Following constraints are specifc to FSL_PAMUV1:
+ *  -aperture must be power of 2, and naturally aligned
+ *  -number of windows must be power of 2, and address space size
+ *   of each window is determined by aperture size / # of windows
+ *  -the actual size of the mapped region of a window must be power
+ *   of 2 starting with 4KB and physical address must be naturally
+ *   aligned.
+ * DOMAIN_ATTR_FSL_PAMUV1 corresponds to the above mentioned contraints.
+ * The caller can invoke iommu_domain_get_attr to check if the underlying
+ * iommu implementation supports these constraints.
+ */
+
 enum iommu_attr {
DOMAIN_ATTR_GEOMETRY,
DOMAIN_ATTR_PAGING,
DOMAIN_ATTR_WINDOWS,
+   DOMAIN_ATTR_FSL_PAMU_STASH,
+   DOMAIN_ATTR_FSL_PAMU_ENABLE,
+   DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_MAX,
 };
 
-- 
1.7.4.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: RFC: vfio / iommu driver for hardware with no iommu

2013-04-24 Thread Sethi Varun-B16395


 -Original Message-
 From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
 boun...@lists.linux-foundation.org] On Behalf Of Joerg Roedel
 Sent: Wednesday, April 24, 2013 4:27 PM
 To: Yoder Stuart-B08248
 Cc: iommu@lists.linux-foundation.org
 Subject: Re: RFC: vfio / iommu driver for hardware with no iommu
 
 On Tue, Apr 23, 2013 at 04:13:00PM +, Yoder Stuart-B08248 wrote:
  We're aware of the obvious limitations-- no protection, DMA'able
  memory must be physically contiguous and will have no iova-phy
  translation.  But there are use cases where all OSes involved are
  trusted and customers can
  live with those limitations.   Virtualization is used
  here not to sandbox untrusted code, but to consolidate multiple OSes.
 
 One of the major points of VFIO is to provide a userspace interface for
 hardware IOMMUs. So if you have a platform without an IOMMU why do you
 care about VFIO at all?
 
Agreed, but vfio also provides a standardized interface for direct device 
assignment under KVM. 

-Varun


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2 V2] iommu/amd: Add workaround for ERBT1312

2013-04-24 Thread Don Dutile

On 04/24/2013 06:46 AM, Joerg Roedel wrote:

On Tue, Apr 23, 2013 at 09:22:45AM -0400, Don Dutile wrote:

Given other threads on this mail list (and I've seen crashes with same problem)
where this type of logging during a flood of IOMMU errors will lock up the 
machine,
is there something that can be done to break the do-while loop after n 
iterations
have been exec'd, so the kernel can progress during a crash ?


In the case of an IOMMU error flood this loop will only run until the
event-log/ppr-log overflows. So it should not turn into an endless loop.


Joerg



Thanks for verification.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: RFC: vfio / iommu driver for hardware with no iommu

2013-04-24 Thread Yoder Stuart-B08248


 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Wednesday, April 24, 2013 6:04 AM
 To: Joerg Roedel; Yoder Stuart-B08248
 Cc: iommu@lists.linux-foundation.org
 Subject: RE: RFC: vfio / iommu driver for hardware with no iommu
 
 
 
  -Original Message-
  From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
  boun...@lists.linux-foundation.org] On Behalf Of Joerg Roedel
  Sent: Wednesday, April 24, 2013 4:27 PM
  To: Yoder Stuart-B08248
  Cc: iommu@lists.linux-foundation.org
  Subject: Re: RFC: vfio / iommu driver for hardware with no iommu
 
  On Tue, Apr 23, 2013 at 04:13:00PM +, Yoder Stuart-B08248 wrote:
   We're aware of the obvious limitations-- no protection, DMA'able
   memory must be physically contiguous and will have no iova-phy
   translation.  But there are use cases where all OSes involved are
   trusted and customers can
   live with those limitations.   Virtualization is used
   here not to sandbox untrusted code, but to consolidate multiple OSes.
 
  One of the major points of VFIO is to provide a userspace interface for 
  hardware
  IOMMUs. So if you have a platform without an IOMMU why do you care about 
  VFIO at
  all?
 
 We want to do direct device assignment to user space.
 So if the device is behind iommu then it will be a secure interface.
 if device is not behind a iommu then it is insecure. But the user space can 
 access the device. This way
 we can be consistent with one mechanism to do direct device assignment.

And more specifically, there's the desire to assign things like
a PCI device to a KVM virtual machine on a platform without
an iommu.   QEMU uses vfio-pci and we don't want to implement
a completely separate user space approach to do this.  We want
QEMU to stay (mostly) unchanged.

Stuart


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: add a function to find an iommu group by id

2013-04-24 Thread Joerg Roedel
On Mon, Mar 25, 2013 at 10:23:49AM +1100, Alexey Kardashevskiy wrote:
 As IOMMU groups are exposed to the user space by their numbers,
 the user space can use them in various kernel APIs so the kernel
 might need an API to find a group by its ID.
 
 As an example, QEMU VFIO on PPC64 platform needs it to associate
 a logical bus number (LIOBN) with a specific IOMMU group in order
 to support in-kernel handling of DMA map/unmap requests.
 
 The patch adds the iommu_group_get_by_id(id) function which performs
 such search.
 
 v2: fixed reference counting.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

Applied to core branch, thanks Alexey.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: RFC: vfio / iommu driver for hardware with no iommu

2013-04-24 Thread Don Dutile

On 04/23/2013 03:47 PM, Alex Williamson wrote:

On Tue, 2013-04-23 at 19:16 +, Yoder Stuart-B08248 wrote:



-Original Message-
From: Alex Williamson [mailto:alex.william...@redhat.com]
Sent: Tuesday, April 23, 2013 11:56 AM
To: Yoder Stuart-B08248
Cc: Joerg Roedel; iommu@lists.linux-foundation.org
Subject: Re: RFC: vfio / iommu driver for hardware with no iommu

On Tue, 2013-04-23 at 16:13 +, Yoder Stuart-B08248 wrote:

Joerg/Alex,

We have embedded systems where we use QEMU/KVM and have
the requirement to do device assignment, but have no
iommu.  So we would like to get vfio-pci working on
systems like this.

We're aware of the obvious limitations-- no protection,
DMA'able memory must be physically contiguous and will
have no iova-phy translation.  But there are use cases
where all OSes involved are trusted and customers can
live with those limitations.   Virtualization is used
here not to sandbox untrusted code, but to consolidate
multiple OSes.

We would like to get your feedback on the rough idea.  There
are two parts-- iommu driver and vfio-pci.

1.  iommu driver

First, we still need device groups created because vfio
is based on that, so we envision a 'dummy' iommu
driver that implements only  the add/remove device
ops.  Something like:

 static struct iommu_ops fsl_none_ops = {
 .add_device = fsl_none_add_device,
 .remove_device  = fsl_none_remove_device,
 };

 int fsl_iommu_none_init()
 {
 int ret = 0;

 ret = iommu_init_mempool();
 if (ret)
 return ret;

 bus_set_iommu(platform_bus_type,fsl_none_ops);
 bus_set_iommu(pci_bus_type,fsl_none_ops);

 return ret;
 }

2.  vfio-pci

For vfio-pci, we would ideally like to keep user space mostly
unchanged.  User space will have to follow the semantics
of mapping only physically contiguous chunks...and iova
will equal phys.

So, we propose to implement a new vfio iommu type,
called VFIO_TYPE_NONE_IOMMU.  This implements
any needed vfio interfaces, but there are no calls
to the iommu layer...e.g. map_dma() is a noop.

Would like your feedback.


My first thought is that this really detracts from vfio and iommu groups
being a secure interface, so somehow this needs to be clearly an
insecure mode that requires an opt-in and maybe taints the kernel.  Any
notion of unprivileged use needs to be blocked and it should test
CAP_COMPROMISE_KERNEL (or whatever it's called now) at critical access
points.  We might even have interfaces exported that would allow this to
be an out-of-tree driver (worth a check).

I would guess that you would probably want to do all the iommu group
setup from the vfio fake-iommu driver.  In other words, that driver both
creates the fake groups and provides the dummy iommu backend for vfio.
That would be a nice way to compartmentalize this as a
vfio-noiommu-special.


So you mean don't implement any of the iommu driver
ops at all and keep everything in the vfio layer?

Would you still have real iommu groups?...i.e.
$ readlink /sys/bus/pci/devices/:06:0d.0/iommu_group
../../../../kernel/iommu_groups/26

...and that is created by vfio-noiommu-special?


I'm suggesting (but haven't checked if it's possible), to implement the
iommu driver ops as part of the vfio iommu backend driver.  The primary
motivation for this would be to a) keep a fake iommu groups interface
out of the iommu proper (possibly containing it in an external driver)
and b) modularizing it so we don't have fake iommu groups being created
by default.  It would have to populate the iommu groups sysfs interfaces
to be compatible with vfio.


Right now when the PCI and platform buses are probed,
the iommu driver add-device callback gets called and
that is where the per-device group gets created.  Are
you envisioning registering a callback for the PCI
bus to do this in vfio-noiommu-special?


Yes.  It's just as easy to walk all the devices rather than doing
callbacks, iirc the group code does this when you register.  In fact,
this noiommu interface may not want to add all devices, we may want to
be very selective and only add some.


Right.
Sounds like a no-iommu driver is needed to leave vfio unaffected,
and still leverage/use vfio for qemu's device assignment.
Just not sure how to 'taint' it as 'not secure' if no-iommu driver put in place.

btw -- qemu has the inherent assumption that pci cfg cycles are trapped,
   so assigned devices are 'remapped' from system-B:D.F to virt-machine's
   (virtualized) B:D.F of the assigned device.
   Are pci-cfg cycles trapped in freescale qemu model ?


Would map/unmap really be no-ops?  Seems like you still want to do page
pinning.


You're right, that was a bad example...most would be no ops though.


Also, you're using fsl in the example above, but would such a
driver have any platform dependency?


This wouldn't have to be fsl specific if we thought it was
potentially 

Re: [PATCH] Reset PCIe devices to stop ongoing DMA

2013-04-24 Thread Don Dutile

On 04/24/2013 12:58 AM, Takao Indoh wrote:

This patch resets PCIe devices on boot to stop ongoing DMA. When
pci=pcie_reset_devices is specified, a hot reset is triggered on each
PCIe root port and downstream port to reset its downstream endpoint.

Problem:
This patch solves the problem that kdump can fail when intel_iommu=on is
specified. When intel_iommu=on is specified, many dma-remapping errors
occur in second kernel and it causes problems like driver error or PCI
SERR, at last kdump fails. This problem is caused as follows.
1) Devices are working on first kernel.
2) Switch to second kernel(kdump kernel). The devices are still working
and its DMA continues during this switch.
3) iommu is initialized during second kernel boot and ongoing DMA causes
dma-remapping errors.

Solution:
All DMA transactions have to be stopped before iommu is initialized. By
this patch devices are reset and in-flight DMA is stopped before
pci_iommu_init.

To invoke hot reset on an endpoint, its upstream link need to be reset.
reset_pcie_devices() is called from fs_initcall_sync, and it finds root
port/downstream port whose child is PCIe endpoint, and then reset link
between them. If the endpoint is VGA device, it is skipped because the
monitor blacks out if VGA controller is reset.


Couple questions wrt VGA device:
(1) Many graphics devices are multi-function, one function being VGA;
is the VGA always function 0, so this scan sees it first  doesn't
do a reset on that PCIe link?  if the VGA is not function 0, won't
this logic break (will reset b/c function 0 is non-VGA graphics) ?
(2) I'm hearing VGA will soon not be the a required console; this logic
assumes it is, and why it isn't blanked.
Q: Should the filter be based on a device having a device-class of display ?


Actually this is v8 patch but quite different from v7 and it's been so
long since previous post, so I start over again.

Thanks for this re-start.  I need to continue reviewing the rest.

Q: Why not force IOMMU off when re-booting a kexec kernel to perform a crash
   dump?  After the crash dump, the system is rebooting to previous (iommu=on) 
setting.
   That logic, along w/your previous patch to disable the IOMMU if iommu=off
   is set, would remove this (relatively slow) PCI init sequencing ?


Previous post:
[PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump
https://lkml.org/lkml/2012/11/26/814

Signed-off-by: Takao Indohindou.ta...@jp.fujitsu.com
---
  Documentation/kernel-parameters.txt |2 +
  drivers/pci/pci.c   |  103 +++
  2 files changed, 105 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 4609e81..2a31ade 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2250,6 +2250,8 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
any pair of devices, possibly at the cost of
reduced performance.  This also guarantees
that hot-added devices will work.
+   pcie_reset_devices  Reset PCIe endpoint on boot by hot
+   reset
cbiosize=nn[KMG]The fixed amount of bus space which is
reserved for the CardBus bridge's IO window.
The default value is 256 bytes.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b099e00..42385c9 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3878,6 +3878,107 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus)
  }
  EXPORT_SYMBOL(pci_fixup_cardbus);

+/*
+ * Return true if dev is PCIe root port or downstream port whose child is PCIe
+ * endpoint except VGA device.
+ */
+static int __init need_reset(struct pci_dev *dev)
+{
+   struct pci_bus *subordinate;
+   struct pci_dev *child;
+
+   if (!pci_is_pcie(dev) || !dev-subordinate ||
+   list_empty(dev-subordinate-devices) ||
+   ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
+(pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)))
+   return 0;
+
+   subordinate = dev-subordinate;
+   list_for_each_entry(child,subordinate-devices, bus_list) {
+   if ((pci_pcie_type(child) == PCI_EXP_TYPE_UPSTREAM) ||
+   (pci_pcie_type(child) == PCI_EXP_TYPE_PCI_BRIDGE) ||
+   ((child-class  16) == PCI_BASE_CLASS_DISPLAY))
+   /* Don't reset switch, bridge, VGA device */
+   return 0;
+   }
+
+   return 1;
+}
+
+static void __init save_config(struct pci_dev *dev)
+{
+   struct pci_bus *subordinate;
+   struct pci_dev *child;
+
+   if (!need_reset(dev))
+   return;
+
+   subordinate = dev-subordinate;
+   list_for_each_entry(child,subordinate-devices,