Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-08 Thread Auger Eric
Hi Will,
On 08/11/2016 20:02, Don Dutile wrote:
> On 11/08/2016 12:54 PM, Will Deacon wrote:
>> On Tue, Nov 08, 2016 at 03:27:23PM +0100, Auger Eric wrote:
>>> On 08/11/2016 03:45, Will Deacon wrote:
 Rather than treat these as separate problems, a better interface is to
 tell userspace about a set of reserved regions, and have this include
 the MSI doorbell, irrespective of whether or not it can be remapped.
 Don suggested that we statically pick an address for the doorbell in a
 similar way to x86, and have the kernel map it there. We could even
 pick
 0xfee0. If it conflicts with a reserved region on the platform (due
 to (4)), then we'd obviously have to (deterministically?) allocate it
 somewhere else, but probably within the bottom 4G.
>>> This is tentatively achieved now with
>>> [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II
>>> (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html)
>>>
>> Yup, I saw that fly by. Hopefully some of the internals can be reused
>> with the current thinking on user ABI.
>>
 The next question is how to tell userspace about all of the reserved
 regions. Initially, the idea was to extend VFIO, however Alex pointed
 out a horrible scenario:

1. QEMU spawns a VM on system 0
2. VM is migrated to system 1
3. QEMU attempts to passthrough a device using PCI hotplug

 In this scenario, the guest memory map is chosen at step (1), yet there
 is no VFIO fd available to determine the reserved regions. Furthermore,
 the reserved regions may vary between system 0 and system 1. This
 pretty
 much rules out using VFIO to determine the reserved regions.Alex
 suggested
 that the SMMU driver can advertise the regions via
 /sys/class/iommu/. This
 would solve part of the problem, but migration between systems with
 different memory maps can still cause problems if the reserved regions
 of the new system conflict with the guest memory map chosen by QEMU.
>>>
>>> OK so I understand we do not want anymore the VFIO chain capability API
>>> (patch 5 of above series) but we prefer a sysfs approach instead.
>> Right.
>>
>>> I understand the sysfs approach which allows the userspace to get the
>>> info earlier and independently on VFIO. Keeping in mind current QEMU
>>> virt - which is not the only userspace - will not do much from this info
>>> until we bring upheavals in virt address space management. So if I am
>>> not wrong, at the moment the main action to be undertaken is the
>>> rejection of the PCI hotplug in case we detect a collision.
>> I don't think so; it should be up to userspace to reject the hotplug.
>> If userspace doesn't have support for the regions, then that's fine --
>> you just end up in a situation where the CPU page table maps memory
>> somewhere that the device can't see. In other words, you'll end up with
>> spurious DMA failures, but that's exactly what happens with current
>> systems
>> if you passthrough an overlapping region (Robin demonstrated this on
>> Juno).
>>
>> Additionally, you can imagine some future support where you can tell the
>> guest not to use certain regions of its memory for DMA. In this case, you
>> wouldn't want to refuse the hotplug in the case of overlapping regions.
>>
>> Really, I think the kernel side just needs to enumerate the fixed
>> reserved
>> regions, place the doorbell at a fixed address and then advertise these
>> via sysfs.
>>
>>> I can respin [1]
>>> - studying and taking into account Robin's comments about dm_regions
>>> similarities
>>> - removing the VFIO capability chain and replacing this by a sysfs API
>> Ideally, this would be reusable between different SMMU drivers so the
>> sysfs
>> entries have the same format etc.
>>
>>> Would that be OK?
>> Sounds good to me. Are you in a position to prototype something on the
>> qemu
>> side once we've got kernel-side agreement?
yes sure.
>>
>>> What about Alex comments who wanted to report the usable memory ranges
>>> instead of unusable memory ranges?
>>>
>>> Also did you have a chance to discuss the following items:
>>> 1) the VFIO irq safety assessment
>> The discussion really focussed on system topology, as opposed to
>> properties
>> of the doorbell. Regardless of how the device talks to the doorbell, if
>> the doorbell can't protect against things like MSI spoofing, then it's
>> unsafe. My opinion is that we shouldn't allow passthrough by default on
>> systems with unsafe doorbells (we could piggyback on
>> allow_unsafe_interrupts
>> cmdline option to VFIO).
OK.
>>
>> A first step would be making all this opt-in, and only supporting GICv3
>> ITS for now.
> You're trying to support a config that is < GICv3 and no ITS ? ...
> That would be the equiv. of x86 pre-intr-remap, and that's why
> allow_unsafe_interrupts
> hook was created ... to enable devel/kick-the-tires.
>>> 2) the MSI reserved size computation (is an arbitr

RE: [PATCH V3 0/8] IOMMU probe deferral support

2016-11-08 Thread Sricharan
Hi Robin,

>On 04/11/16 15:16, Sricharan wrote:
>> Hi Robin,
>>
> Yikes, on second look, that definitely shouldn't be happening.
> Everything below is probably the resulting fallout.

 [   40.206703] vfio-pci :08:00.0: Failed to setup iommu ops

 I think the above print which says "failed to setup iommu_ops"
 because the call ops->add_device failed in of_pci_iommu_configure
 is the reason for the failure, in my case i simply do not get this even 
 with
 your scripts. ops->add_device succeeds in the rebind as well. So still
 checking what could be happening in your case.
>>>
>>> I was looking at your code base from [1].The ops->add_device
>>> callback from of_pci_iommu_configure on the rebind is the
>>> one which is causing the failure. But not able to spot out
>>>from code which point is causing the failure. It would be very helpful
>>> if i can know which is the return value from the add_device callback
>>> or point inside add_device callback which fails in your setup.
>>>
>>>
>>> [1] git://linux-arm.org/linux-rm iommu/misc
>>
>> With little more try, i saw an issue where i had an failure
>> similar to what you reported. The issue happens when multiple
>> devices fall in to same group due to matching sids. I ended up
>> doing a fix like below and it would be nice to verify if it is the same
>> that we are seeing in your setup and if the fix makes a difference ?
>>
>> From: Sricharan R 
>> Date: Fri, 4 Nov 2016 20:28:49 +0530
>> Subject: [PATCH] iommu/arm-smmu: Fix group's reference counting
>>
>> iommu_group_get_for_dev which gets called in the add_device
>> callback, increases the reference count of the iommu_group,
>> so we do an iommu_group_put after that. iommu_group_get_for_dev
>> inturn calls device_group callback and in the case of arm-smmu
>> we call generic_device_group/pci_device_group which takes
>> care of increasing the group's reference. But when we return
>> an already existing group(when multiple devices have same group)
>> the reference is not incremented, resulting in issues when the
>> remove_device callback for the devices is invoked.
>> Fixing the same here.
>
>Bah, yes, this does look like my fault - after flip-flopping between
>about 3 different ways to keep refcounts for the S2CR entries, none of
>which would quite work, I ripped it all out but apparently still got
>things wrong, oh well. Thanks for figuring it out.
 >
>On the probe-deferral angle, whilst it's useful to have uncovered this
>bug, I don't think we should actually be calling remove_device() from
>DMA teardown. I think it's preferable from a user perspective if group
>numbering remains stable, rather than changing depending on the order in
>which they unbind/rebind VFIO drivers. I'm really keen to try and get
>this in shape for 4.10, so I've taken the liberty of hacking up my own
>branch (iommu/defer) based on v3 - would you mind taking a look at the
>two "iommu/of:" commits to see what you think? (Ignore the PCI changes
>to your later patches - that was an experiment which didn't really work out)

Ok, will take a look at this now and respond more on this.

>
>> Signed-off-by: Sricharan R 
>> ---
>>  drivers/iommu/arm-smmu.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 71ce4b6..a1d0b3c 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1516,8 +1516,10 @@ static struct iommu_group 
>> *arm_smmu_device_group(struct device *dev)
>>  group = smmu->s2crs[idx].group;
>>  }
>>
>> -if (group)
>> +if (group) {
>> +iommu_group_get_by_id(iommu_group_id(group));
>>  return group;
>
>This might as well just be inline, i.e.:
>
>   return iommu_group_get_by_id(iommu_group_id(group));
>
>It's a shame we have to go all round the houses when we have the group
>right there, but this is probably the most expedient fix. I guess we can
>extend the API with some sort of iommu_group_get(group) overload in
>future if we really want to.
>

ok, i can send this fix separately then. Otherwise, Will was suggesting on the
other thread that there should probably be a separate API to increment
the group refcount or get the group from the existing aliasing device.
As per me adding the api, looks like another option or post the above ?

Regards,
 Sricharan

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v16, 7/7] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

2016-11-08 Thread Yangbo Lu
The eSDHC of T4240-R1.0-R2.0 has incorrect vender version and spec version.
Acturally the right version numbers should be VVN=0x13 and SVN = 0x1.
This patch adds the GUTS driver support for eSDHC driver to match SoC.
And fix host version to avoid that incorrect version numbers break down
the ADMA data transfer.

Signed-off-by: Yangbo Lu 
Acked-by: Ulf Hansson 
Acked-by: Scott Wood 
Acked-by: Arnd Bergmann 
---
Changes for v2:
- Got SVR through iomap instead of dts
Changes for v3:
- Managed GUTS through syscon instead of iomap in eSDHC driver
Changes for v4:
- Got SVR by GUTS driver instead of SYSCON
Changes for v5:
- Changed to get SVR through API fsl_guts_get_svr()
- Combined patch 4, patch 5 and patch 6 into one
Changes for v6:
- Added 'Acked-by: Ulf Hansson'
Changes for v7:
- None
Changes for v8:
- Added 'Acked-by: Scott Wood'
Changes for v9:
- None
Changes for v10:
- None
Changes for v11:
- Changed to use soc_device_match
Changes for v12:
- Matched soc through .family field instead of .soc_id
Changes for v13:
- None
Changes for v14:
- None
Changes for v15:
- None
Changes for v16:
- Added 'Acked-by: Arnd'
---
 drivers/mmc/host/Kconfig  |  1 +
 drivers/mmc/host/sdhci-of-esdhc.c | 20 
 2 files changed, 21 insertions(+)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 5cf7eba..4128a3c 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -144,6 +144,7 @@ config MMC_SDHCI_OF_ESDHC
depends on MMC_SDHCI_PLTFM
depends on PPC || ARCH_MXC || ARCH_LAYERSCAPE
select MMC_SDHCI_IO_ACCESSORS
+   select FSL_GUTS
help
  This selects the Freescale eSDHC controller support.
 
diff --git a/drivers/mmc/host/sdhci-of-esdhc.c 
b/drivers/mmc/host/sdhci-of-esdhc.c
index fb71c86..57bdb9e 100644
--- a/drivers/mmc/host/sdhci-of-esdhc.c
+++ b/drivers/mmc/host/sdhci-of-esdhc.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "sdhci-pltfm.h"
 #include "sdhci-esdhc.h"
@@ -28,6 +29,7 @@
 struct sdhci_esdhc {
u8 vendor_ver;
u8 spec_ver;
+   bool quirk_incorrect_hostver;
 };
 
 /**
@@ -73,6 +75,8 @@ static u32 esdhc_readl_fixup(struct sdhci_host *host,
 static u16 esdhc_readw_fixup(struct sdhci_host *host,
 int spec_reg, u32 value)
 {
+   struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
+   struct sdhci_esdhc *esdhc = sdhci_pltfm_priv(pltfm_host);
u16 ret;
int shift = (spec_reg & 0x2) * 8;
 
@@ -80,6 +84,12 @@ static u16 esdhc_readw_fixup(struct sdhci_host *host,
ret = value & 0x;
else
ret = (value >> shift) & 0x;
+   /* Workaround for T4240-R1.0-R2.0 eSDHC which has incorrect
+* vendor version and spec version information.
+*/
+   if ((spec_reg == SDHCI_HOST_VERSION) &&
+   (esdhc->quirk_incorrect_hostver))
+   ret = (VENDOR_V_23 << SDHCI_VENDOR_VER_SHIFT) | SDHCI_SPEC_200;
return ret;
 }
 
@@ -558,6 +568,12 @@ static const struct sdhci_pltfm_data sdhci_esdhc_le_pdata 
= {
.ops = &sdhci_esdhc_le_ops,
 };
 
+static struct soc_device_attribute soc_incorrect_hostver[] = {
+   { .family = "QorIQ T4240", .revision = "1.0", },
+   { .family = "QorIQ T4240", .revision = "2.0", },
+   { },
+};
+
 static void esdhc_init(struct platform_device *pdev, struct sdhci_host *host)
 {
struct sdhci_pltfm_host *pltfm_host;
@@ -571,6 +587,10 @@ static void esdhc_init(struct platform_device *pdev, 
struct sdhci_host *host)
esdhc->vendor_ver = (host_ver & SDHCI_VENDOR_VER_MASK) >>
 SDHCI_VENDOR_VER_SHIFT;
esdhc->spec_ver = host_ver & SDHCI_SPEC_VER_MASK;
+   if (soc_device_match(soc_incorrect_hostver))
+   esdhc->quirk_incorrect_hostver = true;
+   else
+   esdhc->quirk_incorrect_hostver = false;
 }
 
 static int sdhci_esdhc_probe(struct platform_device *pdev)
-- 
2.1.0.27.g96db324

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v16, 3/7] soc: fsl: add GUTS driver for QorIQ platforms

2016-11-08 Thread Yangbo Lu
The global utilities block controls power management, I/O device
enabling, power-onreset(POR) configuration monitoring, alternate
function selection for multiplexed signals,and clock control.

This patch adds a driver to manage and access global utilities block.
Initially only reading SVR and registering soc device are supported.
Other guts accesses, such as reading RCW, should eventually be moved
into this driver as well.

Signed-off-by: Yangbo Lu 
Acked-by: Arnd Bergmann 
---
Changes for v4:
- Added this patch
Changes for v5:
- Modified copyright info
- Changed MODULE_LICENSE to GPL
- Changed EXPORT_SYMBOL_GPL to EXPORT_SYMBOL
- Made FSL_GUTS user-invisible
- Added a complete compatible list for GUTS
- Stored guts info in file-scope variable
- Added mfspr() getting SVR
- Redefined GUTS APIs
- Called fsl_guts_init rather than using platform driver
- Removed useless parentheses
- Removed useless 'extern' key words
Changes for v6:
- Made guts thread safe in fsl_guts_init
Changes for v7:
- Removed 'ifdef' for function declaration in guts.h
Changes for v8:
- Fixes lines longer than 80 characters checkpatch issue
- Added 'Acked-by: Scott Wood'
Changes for v9:
- None
Changes for v10:
- None
Changes for v11:
- Changed to platform driver
Changes for v12:
- Removed "signed-off-by: Scott"
- Defined fsl_soc_die_attr struct array instead of
  soc_device_attribute
- Re-designed soc_device_attribute for QorIQ SoC
- Other minor fixes
Changes for v13:
- Rebased
- Removed text after 'bool' in Kconfig
- Removed ARCH ifdefs
- Added more bits for ls1021a mask
- Used devm
Changes for v14:
- Used devm_ioremap_resource
Changes for v15:
- Fixed error code for devm_ioremap_resource
Changes for v16:
- Removed header file svr.h and calculated REV_MAJ/MIN in this driver
- Added 'Acked-by: Arnd'
---
 drivers/soc/Kconfig  |   3 +-
 drivers/soc/fsl/Kconfig  |  18 
 drivers/soc/fsl/Makefile |   1 +
 drivers/soc/fsl/guts.c   | 236 +++
 include/linux/fsl/guts.h | 125 +++--
 5 files changed, 333 insertions(+), 50 deletions(-)
 create mode 100644 drivers/soc/fsl/Kconfig
 create mode 100644 drivers/soc/fsl/guts.c

diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
index e6e90e8..f31bceb 100644
--- a/drivers/soc/Kconfig
+++ b/drivers/soc/Kconfig
@@ -1,8 +1,7 @@
 menu "SOC (System On Chip) specific Drivers"
 
 source "drivers/soc/bcm/Kconfig"
-source "drivers/soc/fsl/qbman/Kconfig"
-source "drivers/soc/fsl/qe/Kconfig"
+source "drivers/soc/fsl/Kconfig"
 source "drivers/soc/mediatek/Kconfig"
 source "drivers/soc/qcom/Kconfig"
 source "drivers/soc/rockchip/Kconfig"
diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
new file mode 100644
index 000..7a9fb9b
--- /dev/null
+++ b/drivers/soc/fsl/Kconfig
@@ -0,0 +1,18 @@
+#
+# Freescale SOC drivers
+#
+
+source "drivers/soc/fsl/qbman/Kconfig"
+source "drivers/soc/fsl/qe/Kconfig"
+
+config FSL_GUTS
+   bool
+   select SOC_BUS
+   help
+ The global utilities block controls power management, I/O device
+ enabling, power-onreset(POR) configuration monitoring, alternate
+ function selection for multiplexed signals,and clock control.
+ This driver is to manage and access global utilities block.
+ Initially only reading SVR and registering soc device are supported.
+ Other guts accesses, such as reading RCW, should eventually be moved
+ into this driver as well.
diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
index 75e1f53..44b3beb 100644
--- a/drivers/soc/fsl/Makefile
+++ b/drivers/soc/fsl/Makefile
@@ -5,3 +5,4 @@
 obj-$(CONFIG_FSL_DPAA) += qbman/
 obj-$(CONFIG_QUICC_ENGINE) += qe/
 obj-$(CONFIG_CPM)  += qe/
+obj-$(CONFIG_FSL_GUTS) += guts.o
diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
new file mode 100644
index 000..0ac8826
--- /dev/null
+++ b/drivers/soc/fsl/guts.c
@@ -0,0 +1,236 @@
+/*
+ * Freescale QorIQ Platforms GUTS Driver
+ *
+ * Copyright (C) 2016 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct guts {
+   struct ccsr_guts __iomem *regs;
+   bool little_endian;
+};
+
+struct fsl_soc_die_attr {
+   char*die;
+   u32 svr;
+   u32 mask;
+};
+
+static struct guts *guts;
+static struct soc_device_attribute soc_dev_attr;

[v16, 6/7] base: soc: Check for NULL SoC device attributes

2016-11-08 Thread Yangbo Lu
From: Geert Uytterhoeven 

If soc_device_match() is used to check the value of a specific
attribute that is not present for the current SoC, the kernel crashes
with a NULL pointer dereference.

Fix this by explicitly checking for the absence of a needed property,
and considering this a non-match.

Signed-off-by: Geert Uytterhoeven 
Acked-by: Arnd Bergmann 
---
Changes for v16:
- Added this patch
---
 drivers/base/soc.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/base/soc.c b/drivers/base/soc.c
index 0c5cf87..0e701e2 100644
--- a/drivers/base/soc.c
+++ b/drivers/base/soc.c
@@ -167,19 +167,23 @@ static int soc_device_match_one(struct device *dev, void 
*arg)
const struct soc_device_attribute *match = arg;
 
if (match->machine &&
-   !glob_match(match->machine, soc_dev->attr->machine))
+   (!soc_dev->attr->machine ||
+!glob_match(match->machine, soc_dev->attr->machine)))
return 0;
 
if (match->family &&
-   !glob_match(match->family, soc_dev->attr->family))
+   (!soc_dev->attr->family ||
+!glob_match(match->family, soc_dev->attr->family)))
return 0;
 
if (match->revision &&
-   !glob_match(match->revision, soc_dev->attr->revision))
+   (!soc_dev->attr->revision ||
+!glob_match(match->revision, soc_dev->attr->revision)))
return 0;
 
if (match->soc_id &&
-   !glob_match(match->soc_id, soc_dev->attr->soc_id))
+   (!soc_dev->attr->soc_id ||
+!glob_match(match->soc_id, soc_dev->attr->soc_id)))
return 0;
 
return 1;
-- 
2.1.0.27.g96db324

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v16, 0/7] Fix eSDHC host version register bug

2016-11-08 Thread Yangbo Lu
This patchset is used to fix a host version register bug in the T4240-R1.0-R2.0
eSDHC controller. To match the SoC version and revision, 15 previous version
patchsets had tried many methods but all of them were rejected by reviewers.
Such as
- dts compatible method
- syscon method
- ifdef PPC method
- GUTS driver getting SVR method
Anrd suggested a soc_device_match method in v10, and this is the only available
method left now. This v11 patchset introduces the soc_device_match interface in
soc driver.

The first four patches of Yangbo are to add the GUTS driver. This is used to
register a soc device which contain soc version and revision information.
The other three patches introduce the soc_device_match method in soc driver
and apply it on esdhc driver to fix this bug.

---
Changes for v15:
- Dropped patch 'dt: bindings: update Freescale DCFG compatible'
  since the work had been done by below patch on ShawnGuo's linux tree.
  'dt-bindings: fsl: add LS1043A/LS1046A/LS2080A compatible for SCFG
   and DCFG'
- Fixed error code issue in guts driver
Changes for v16:
- Dropped patch 'powerpc/fsl: move mpc85xx.h to include/linux/fsl'
- Added a bug-fix patch from Geert
---

Arnd Bergmann (1):
  base: soc: introduce soc_device_match() interface

Geert Uytterhoeven (1):
  base: soc: Check for NULL SoC device attributes

Yangbo Lu (5):
  ARM64: dts: ls2080a: add device configuration node
  dt: bindings: move guts devicetree doc out of powerpc directory
  soc: fsl: add GUTS driver for QorIQ platforms
  MAINTAINERS: add entry for Freescale SoC drivers
  mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

 .../bindings/{powerpc => soc}/fsl/guts.txt |   3 +
 MAINTAINERS|  11 +-
 arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi |   6 +
 drivers/base/Kconfig   |   1 +
 drivers/base/soc.c |  70 ++
 drivers/mmc/host/Kconfig   |   1 +
 drivers/mmc/host/sdhci-of-esdhc.c  |  20 ++
 drivers/soc/Kconfig|   3 +-
 drivers/soc/fsl/Kconfig|  18 ++
 drivers/soc/fsl/Makefile   |   1 +
 drivers/soc/fsl/guts.c | 236 +
 include/linux/fsl/guts.h   | 125 ++-
 include/linux/sys_soc.h|   3 +
 13 files changed, 447 insertions(+), 51 deletions(-)
 rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%)
 create mode 100644 drivers/soc/fsl/Kconfig
 create mode 100644 drivers/soc/fsl/guts.c

-- 
2.1.0.27.g96db324

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v16, 2/7] dt: bindings: move guts devicetree doc out of powerpc directory

2016-11-08 Thread Yangbo Lu
Move guts devicetree doc to Documentation/devicetree/bindings/soc/fsl/
since it's used by not only PowerPC but also ARM. And add a specification
for 'little-endian' property.

Signed-off-by: Yangbo Lu 
Acked-by: Rob Herring 
Acked-by: Scott Wood 
Acked-by: Arnd Bergmann 
---
Changes for v4:
- Added this patch
Changes for v5:
- Modified the description for little-endian property
Changes for v6:
- None
Changes for v7:
- None
Changes for v8:
- Added 'Acked-by: Scott Wood'
- Added 'Acked-by: Rob Herring'
Changes for v9:
- None
Changes for v10:
- None
Changes for v11:
- None
Changes for v12:
- None
Changes for v13:
- None
Changes for v14:
- None
Changes for v15:
- None
Changes for v16:
- Added 'Acked-by: Arnd'
---
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt | 3 +++
 1 file changed, 3 insertions(+)
 rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%)

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt 
b/Documentation/devicetree/bindings/soc/fsl/guts.txt
similarity index 91%
rename from Documentation/devicetree/bindings/powerpc/fsl/guts.txt
rename to Documentation/devicetree/bindings/soc/fsl/guts.txt
index b71b203..07adca9 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt
+++ b/Documentation/devicetree/bindings/soc/fsl/guts.txt
@@ -25,6 +25,9 @@ Recommended properties:
  - fsl,liodn-bits : Indicates the number of defined bits in the LIODN
registers, for those SOCs that have a PAMU device.
 
+ - little-endian : Indicates that the global utilities block is little
+   endian. The default is big endian.
+
 Examples:
global-utilities@e {/* global utilities block */
compatible = "fsl,mpc8548-guts";
-- 
2.1.0.27.g96db324

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v16, 5/7] base: soc: introduce soc_device_match() interface

2016-11-08 Thread Yangbo Lu
From: Arnd Bergmann 

We keep running into cases where device drivers want to know the exact
version of the a SoC they are currently running on. In the past, this has
usually been done through a vendor specific API that can be called by a
driver, or by directly accessing some kind of version register that is
not part of the device itself but that belongs to a global register area
of the chip.

Common reasons for doing this include:

- A machine is not using devicetree or similar for passing data about
  on-chip devices, but just announces their presence using boot-time
  platform devices, and the machine code itself does not care about the
  revision.

- There is existing firmware or boot loaders with existing DT binaries
  with generic compatible strings that do not identify the particular
  revision of each device, but the driver knows which SoC revisions
  include which part.

- A prerelease version of a chip has some quirks and we are using the same
  version of the bootloader and the DT blob on both the prerelease and the
  final version. An update of the DT binding seems inappropriate because
  that would involve maintaining multiple copies of the dts and/or
  bootloader.

This patch introduces the soc_device_match() interface that is meant to
work like of_match_node() but instead of identifying the version of a
device, it identifies the SoC itself using a vendor-agnostic interface.

Unlike of_match_node(), we do not do an exact string compare but instead
use glob_match() to allow wildcards in strings.

Signed-off-by: Arnd Bergmann 
Signed-off-by: Yangbo Lu 
Acked-by: Greg Kroah-Hartman 
---
Changes for v11:
- Added this patch for soc match
Changes for v12:
- Corrected the author
- Rewrited soc_device_match with while loop
Changes for v13:
- Added ack from Greg
Changes for v14:
- None
Changes for v15:
- None
Changes for v16:
- None
---
 drivers/base/Kconfig|  1 +
 drivers/base/soc.c  | 66 +
 include/linux/sys_soc.h |  3 +++
 3 files changed, 70 insertions(+)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index d02e7c0..2abea87 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -237,6 +237,7 @@ config GENERIC_CPU_AUTOPROBE
 
 config SOC_BUS
bool
+   select GLOB
 
 source "drivers/base/regmap/Kconfig"
 
diff --git a/drivers/base/soc.c b/drivers/base/soc.c
index b63f23e..0c5cf87 100644
--- a/drivers/base/soc.c
+++ b/drivers/base/soc.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static DEFINE_IDA(soc_ida);
 
@@ -159,3 +160,68 @@ static int __init soc_bus_register(void)
return bus_register(&soc_bus_type);
 }
 core_initcall(soc_bus_register);
+
+static int soc_device_match_one(struct device *dev, void *arg)
+{
+   struct soc_device *soc_dev = container_of(dev, struct soc_device, dev);
+   const struct soc_device_attribute *match = arg;
+
+   if (match->machine &&
+   !glob_match(match->machine, soc_dev->attr->machine))
+   return 0;
+
+   if (match->family &&
+   !glob_match(match->family, soc_dev->attr->family))
+   return 0;
+
+   if (match->revision &&
+   !glob_match(match->revision, soc_dev->attr->revision))
+   return 0;
+
+   if (match->soc_id &&
+   !glob_match(match->soc_id, soc_dev->attr->soc_id))
+   return 0;
+
+   return 1;
+}
+
+/*
+ * soc_device_match - identify the SoC in the machine
+ * @matches: zero-terminated array of possible matches
+ *
+ * returns the first matching entry of the argument array, or NULL
+ * if none of them match.
+ *
+ * This function is meant as a helper in place of of_match_node()
+ * in cases where either no device tree is available or the information
+ * in a device node is insufficient to identify a particular variant
+ * by its compatible strings or other properties. For new devices,
+ * the DT binding should always provide unique compatible strings
+ * that allow the use of of_match_node() instead.
+ *
+ * The calling function can use the .data entry of the
+ * soc_device_attribute to pass a structure or function pointer for
+ * each entry.
+ */
+const struct soc_device_attribute *soc_device_match(
+   const struct soc_device_attribute *matches)
+{
+   int ret = 0;
+
+   if (!matches)
+   return NULL;
+
+   while (!ret) {
+   if (!(matches->machine || matches->family ||
+ matches->revision || matches->soc_id))
+   break;
+   ret = bus_for_each_dev(&soc_bus_type, NULL, (void *)matches,
+  soc_device_match_one);
+   if (!ret)
+   matches++;
+   else
+   return matches;
+   }
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(soc_device_match);
diff --git a/include/linux/sys_soc.h b/include/linux/sys

[v16, 4/7] MAINTAINERS: add entry for Freescale SoC drivers

2016-11-08 Thread Yangbo Lu
Add maintainer entry for Freescale SoC drivers including
the QE library and the GUTS driver now. Also add maintainer
for QE library.

Signed-off-by: Yangbo Lu 
Acked-by: Scott Wood 
Acked-by: Qiang Zhao 
Acked-by: Arnd Bergmann 
---
Changes for v8:
- Added this patch
Changes for v9:
- Added linux-arm mail list
- Removed GUTS driver entry
Changes for v10:
- Changed 'DRIVER' to 'DRIVERS'
- Added 'Acked-by' of Scott and Qiang
Changes for v11:
- None
Changes for v12:
- None
Changes for v13:
- None
Changes for v14:
- None
Changes for v15:
- None
Changes for v16:
- Added 'Acked-by: Arnd'
---
 MAINTAINERS | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9be761f..e1a8835 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5045,9 +5045,18 @@ S:   Maintained
 F: drivers/net/ethernet/freescale/fman
 F: Documentation/devicetree/bindings/powerpc/fsl/fman.txt
 
+FREESCALE SOC DRIVERS
+M: Scott Wood 
+L: linuxppc-...@lists.ozlabs.org
+L: linux-arm-ker...@lists.infradead.org
+S: Maintained
+F: drivers/soc/fsl/
+F: include/linux/fsl/
+
 FREESCALE QUICC ENGINE LIBRARY
+M: Qiang Zhao 
 L: linuxppc-...@lists.ozlabs.org
-S: Orphan
+S: Maintained
 F: drivers/soc/fsl/qe/
 F: include/soc/fsl/*qe*.h
 F: include/soc/fsl/*ucc*.h
-- 
2.1.0.27.g96db324

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[v16, 1/7] ARM64: dts: ls2080a: add device configuration node

2016-11-08 Thread Yangbo Lu
Add the dts node for device configuration unit that provides
general purpose configuration and status for the device.

Signed-off-by: Yangbo Lu 
Acked-by: Scott Wood 
Acked-by: Arnd Bergmann 
---
Changes for v5:
- Added this patch
Changes for v6:
- None
Changes for v7:
- None
Changes for v8:
- Added 'Acked-by: Scott Wood'
Changes for v9:
- None
Changes for v10:
- None
Changes for v11:
- None
Changes for v12:
- None
Changes for v13:
- None
Changes for v14:
- None
Changes for v15:
- None
Changes for v16:
- Added 'Acked-by: Arnd'
---
 arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
index 7f0dc13..d058e56 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
@@ -216,6 +216,12 @@
clocks = <&sysclk>;
};
 
+   dcfg: dcfg@1e0 {
+   compatible = "fsl,ls2080a-dcfg", "syscon";
+   reg = <0x0 0x1e0 0x0 0x1>;
+   little-endian;
+   };
+
serial0: serial@21c0500 {
compatible = "fsl,ns16550", "ns16550a";
reg = <0x0 0x21c0500 0x0 0x100>;
-- 
2.1.0.27.g96db324

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-08 Thread Don Dutile

On 11/08/2016 06:35 PM, Alex Williamson wrote:

On Tue, 8 Nov 2016 21:29:22 +0100
Christoffer Dall  wrote:


Hi Will,

On Tue, Nov 08, 2016 at 02:45:59AM +, Will Deacon wrote:

Hi all,

I figured this was a reasonable post to piggy-back on for the LPC minutes
relating to guest MSIs on arm64.

On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote:

We can always have QEMU reject hot-adding the device if the reserved
region overlaps existing guest RAM, but I don't even really see how we
advise users to give them a reasonable chance of avoiding that
possibility.  Apparently there are also ARM platforms where MSI pages
cannot be remapped to support the previous programmable user/VM
address, is it even worthwhile to support those platforms?  Does that
decision influence whether user programmable MSI reserved regions are
really a second class citizen to fixed reserved regions?  I expect
we'll be talking about this tomorrow morning, but I certainly haven't
come up with any viable solutions to this.  Thanks,


At LPC last week, we discussed guest MSIs on arm64 as part of the PCI
microconference. I presented some slides to illustrate some of the issues
we're trying to solve:

   http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf

Punit took some notes (thanks!) on the etherpad here:

   https://etherpad.openstack.org/p/LPC2016_PCI

although the discussion was pretty lively and jumped about, so I've had
to go from memory where the notes didn't capture everything that was
said.

To summarise, arm64 platforms differ in their handling of MSIs when compared
to x86:

   1. The physical memory map is not standardised (Jon pointed out that
  this is something that was realised late on)
   2. MSIs are usually treated the same as DMA writes, in that they must be
  mapped by the SMMU page tables so that they target a physical MSI
  doorbell
   3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI
  doorbell built into the PCI RC)
   4. Platforms typically have some set of addresses that abort before
  reaching the SMMU (e.g. because the PCI identifies them as P2P).

All of this means that userspace (QEMU) needs to identify the memory
regions corresponding to points (3) and (4) and ensure that they are
not allocated in the guest physical (IPA) space. For platforms that can
remap the MSI doorbell as in (2), then some space also needs to be
allocated for that.

Rather than treat these as separate problems, a better interface is to
tell userspace about a set of reserved regions, and have this include
the MSI doorbell, irrespective of whether or not it can be remapped.


Is my understanding correct, that you need to tell userspace about the
location of the doorbell (in the IOVA space) in case (2), because even
though the configuration of the device is handled by the (host) kernel
through trapping of the BARs, we have to avoid the VFIO user programming
the device to create other DMA transactions to this particular address,
since that will obviously conflict and either not produce the desired
DMA transactions or result in unintended weird interrupts?


Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then
it's potentially a DMA target and we'll get bogus data on DMA read from
the device, and lose data and potentially trigger spurious interrupts on
DMA write from the device.  Thanks,

Alex


That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e.,
they address match before the SMMU checks are done.  if
all DMA addrs had to go through SMMU first, then the DMA access could
be ignored/rejected.
For bare-metal, memory can't be put in the same place as MSI addrs, or
DMA could never reach it.  So, only a virt issue, unless the VMs mem address
range mimic the host layout.

- Don

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II))

2016-11-08 Thread Alex Williamson
On Tue, 8 Nov 2016 21:29:22 +0100
Christoffer Dall  wrote:

> Hi Will,
> 
> On Tue, Nov 08, 2016 at 02:45:59AM +, Will Deacon wrote:
> > Hi all,
> > 
> > I figured this was a reasonable post to piggy-back on for the LPC minutes
> > relating to guest MSIs on arm64.
> > 
> > On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote:  
> > > We can always have QEMU reject hot-adding the device if the reserved
> > > region overlaps existing guest RAM, but I don't even really see how we
> > > advise users to give them a reasonable chance of avoiding that
> > > possibility.  Apparently there are also ARM platforms where MSI pages
> > > cannot be remapped to support the previous programmable user/VM
> > > address, is it even worthwhile to support those platforms?  Does that
> > > decision influence whether user programmable MSI reserved regions are
> > > really a second class citizen to fixed reserved regions?  I expect
> > > we'll be talking about this tomorrow morning, but I certainly haven't
> > > come up with any viable solutions to this.  Thanks,  
> > 
> > At LPC last week, we discussed guest MSIs on arm64 as part of the PCI
> > microconference. I presented some slides to illustrate some of the issues
> > we're trying to solve:
> > 
> >   http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf
> > 
> > Punit took some notes (thanks!) on the etherpad here:
> > 
> >   https://etherpad.openstack.org/p/LPC2016_PCI
> > 
> > although the discussion was pretty lively and jumped about, so I've had
> > to go from memory where the notes didn't capture everything that was
> > said.
> > 
> > To summarise, arm64 platforms differ in their handling of MSIs when compared
> > to x86:
> > 
> >   1. The physical memory map is not standardised (Jon pointed out that
> >  this is something that was realised late on)
> >   2. MSIs are usually treated the same as DMA writes, in that they must be
> >  mapped by the SMMU page tables so that they target a physical MSI
> >  doorbell
> >   3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI
> >  doorbell built into the PCI RC)
> >   4. Platforms typically have some set of addresses that abort before
> >  reaching the SMMU (e.g. because the PCI identifies them as P2P).
> > 
> > All of this means that userspace (QEMU) needs to identify the memory
> > regions corresponding to points (3) and (4) and ensure that they are
> > not allocated in the guest physical (IPA) space. For platforms that can
> > remap the MSI doorbell as in (2), then some space also needs to be
> > allocated for that.
> > 
> > Rather than treat these as separate problems, a better interface is to
> > tell userspace about a set of reserved regions, and have this include
> > the MSI doorbell, irrespective of whether or not it can be remapped.  
> 
> Is my understanding correct, that you need to tell userspace about the
> location of the doorbell (in the IOVA space) in case (2), because even
> though the configuration of the device is handled by the (host) kernel
> through trapping of the BARs, we have to avoid the VFIO user programming
> the device to create other DMA transactions to this particular address,
> since that will obviously conflict and either not produce the desired
> DMA transactions or result in unintended weird interrupts?

Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then
it's potentially a DMA target and we'll get bogus data on DMA read from
the device, and lose data and potentially trigger spurious interrupts on
DMA write from the device.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/2] iommu/exynos: Add proper runtime pm support

2016-11-08 Thread Luis R. Rodriguez
On Mon, Oct 10, 2016 at 03:32:06PM +0200, Marek Szyprowski wrote:
> Hi Luis
> 
> 
> On 2016-10-06 19:37, Luis R. Rodriguez wrote:
> > On Thu, Sep 29, 2016 at 10:12:31AM +0200, Marek Szyprowski wrote:
> > > This patch uses recently introduced device links to track the runtime pm
> > > state of the master's device. This way each SYSMMU controller is runtime
> > > activated when its master's device is active
> > instead of?
> 
> instead of keeping SYSMMU controller runtime active all the time.

I thought Rafael's work was for suspend/resume, not for runtime suspend.
Is it for both ? Because as far as I can tell this was painted to help
with suspend/resume ?

> > BTW what is the master device of a SYSMMU? I have no clue about these
> > IOMMU devices here.
> 
> Here is a more detailed description of IOMMU hardware I wrote a few days ago
> for Ulf:
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1231006.html
> 
> In short: there is a SYSMMU controller and its master device - a device,
> which performs DMA operations. That SYSMMU sits in between system memory
> and the master device, so it performs mapping of DMA addresses to physical
> memory addresses on each DMA operation.

So you seek a run time power optimization ? Or a fix on suspend? Or both?

> > > and can save/restore its state instead of being enabled all the time.
> > I take it this means currently even if the master device is disabled
> > (whatever that is) all SYSMMU controllers are kept enabled, is that right?
> > The issue here is this wastes power? Or what is the issue?
> 
> Yes, the issue here is the fact that SYSMMU is kept active all the time,
> what in turn prevent the power domain for turning off even if master device
> doesn't do anything and is already suspended. This directly (some clocks
> enabled) and in-directly (leakage current) causes power looses.

Thanks for the confirmation so really the biggest concern here was run time PM.

> > > This way SYSMMU controllers no
> > > longer prevents respective power domains to be turned off when master's
> > > device is not used.
> > So when the master device is idle we want to also remove power from the
> > controllers ? How much power does this save on a typical device in the
> > market BTW ?
> 
> The main purpose of this patchset is to let power domains to be turned off,
> because with the current code all domains are all the time turned on,
> because
> SYSMMU controllers prevent them from turning them off.

I see.. I think the audio folks already addressed this with DAPM, but granted
this was for audio. Then I was also referred to the DRM / Audio component
framework, has this been looked into? v4l folks have v4l async stuff but
its not clear if that help with run time PM. I'm mentioning these given it'd be
silly to re-invent the wheel, additionally if we now have a generic solution
everyone can jump on board with there is quite a bit of work we can do to
dump a lot of old legacy crap.

> If you want I can measure the power consumption of the idle board with all
> domains enabled and disabled if you want to see the numbers. On the other
> board
> disabling most power domains in idle state (the clocks were already
> disabled)
> gave me about 20mA savings (at 3.7V), what is a significant value for the
> battery powered device.

Thanks, this means nothing to me, however it would be value-add to the commit 
log
as anyone reviewing  this can understand what the goal / savings was for 
exactly.

> > 
> > > Signed-off-by: Marek Szyprowski 
> > > ---
> > >   drivers/iommu/exynos-iommu.c | 225 
> > > ++-
> > >   1 file changed, 94 insertions(+), 131 deletions(-)
> > I'm reviewing the device link patches now but since this is a demo of
> > use of that I'll note the changes here are pretty large and it makes
> > it terribly difficult for review. Is there any way this patch can be split
> > up in to logical atomic pieces that only do one task upon change ?
> 
> I will try to split it a bit, but I cannot promise that much can be done
> to improve readability for someone not very familiar with the driver
> internals.

I've heard this before, I don't buy it but lets see!

  Luis
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 (Alt II))

2016-11-08 Thread Christoffer Dall
Hi Will,

On Tue, Nov 08, 2016 at 02:45:59AM +, Will Deacon wrote:
> Hi all,
> 
> I figured this was a reasonable post to piggy-back on for the LPC minutes
> relating to guest MSIs on arm64.
> 
> On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote:
> > We can always have QEMU reject hot-adding the device if the reserved
> > region overlaps existing guest RAM, but I don't even really see how we
> > advise users to give them a reasonable chance of avoiding that
> > possibility.  Apparently there are also ARM platforms where MSI pages
> > cannot be remapped to support the previous programmable user/VM
> > address, is it even worthwhile to support those platforms?  Does that
> > decision influence whether user programmable MSI reserved regions are
> > really a second class citizen to fixed reserved regions?  I expect
> > we'll be talking about this tomorrow morning, but I certainly haven't
> > come up with any viable solutions to this.  Thanks,
> 
> At LPC last week, we discussed guest MSIs on arm64 as part of the PCI
> microconference. I presented some slides to illustrate some of the issues
> we're trying to solve:
> 
>   http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf
> 
> Punit took some notes (thanks!) on the etherpad here:
> 
>   https://etherpad.openstack.org/p/LPC2016_PCI
> 
> although the discussion was pretty lively and jumped about, so I've had
> to go from memory where the notes didn't capture everything that was
> said.
> 
> To summarise, arm64 platforms differ in their handling of MSIs when compared
> to x86:
> 
>   1. The physical memory map is not standardised (Jon pointed out that
>  this is something that was realised late on)
>   2. MSIs are usually treated the same as DMA writes, in that they must be
>  mapped by the SMMU page tables so that they target a physical MSI
>  doorbell
>   3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI
>  doorbell built into the PCI RC)
>   4. Platforms typically have some set of addresses that abort before
>  reaching the SMMU (e.g. because the PCI identifies them as P2P).
> 
> All of this means that userspace (QEMU) needs to identify the memory
> regions corresponding to points (3) and (4) and ensure that they are
> not allocated in the guest physical (IPA) space. For platforms that can
> remap the MSI doorbell as in (2), then some space also needs to be
> allocated for that.
> 
> Rather than treat these as separate problems, a better interface is to
> tell userspace about a set of reserved regions, and have this include
> the MSI doorbell, irrespective of whether or not it can be remapped.

Is my understanding correct, that you need to tell userspace about the
location of the doorbell (in the IOVA space) in case (2), because even
though the configuration of the device is handled by the (host) kernel
through trapping of the BARs, we have to avoid the VFIO user programming
the device to create other DMA transactions to this particular address,
since that will obviously conflict and either not produce the desired
DMA transactions or result in unintended weird interrupts?

Thanks,
Christoffer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-08 Thread Will Deacon
On Tue, Nov 08, 2016 at 02:02:39PM -0500, Don Dutile wrote:
> On 11/08/2016 12:54 PM, Will Deacon wrote:
> >A first step would be making all this opt-in, and only supporting GICv3
> >ITS for now.
> You're trying to support a config that is < GICv3 and no ITS ? ...
> That would be the equiv. of x86 pre-intr-remap, and that's why 
> allow_unsafe_interrupts
> hook was created ... to enable devel/kick-the-tires.

Yup, that's exactly what I was envisaging. For systems that can't do
passthrough safely, we'll always have to throw a devel switch.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-08 Thread Don Dutile

On 11/08/2016 12:54 PM, Will Deacon wrote:

On Tue, Nov 08, 2016 at 03:27:23PM +0100, Auger Eric wrote:

On 08/11/2016 03:45, Will Deacon wrote:

Rather than treat these as separate problems, a better interface is to
tell userspace about a set of reserved regions, and have this include
the MSI doorbell, irrespective of whether or not it can be remapped.
Don suggested that we statically pick an address for the doorbell in a
similar way to x86, and have the kernel map it there. We could even pick
0xfee0. If it conflicts with a reserved region on the platform (due
to (4)), then we'd obviously have to (deterministically?) allocate it
somewhere else, but probably within the bottom 4G.

This is tentatively achieved now with
[1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II
(http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html)

Yup, I saw that fly by. Hopefully some of the internals can be reused
with the current thinking on user ABI.


The next question is how to tell userspace about all of the reserved
regions. Initially, the idea was to extend VFIO, however Alex pointed
out a horrible scenario:

   1. QEMU spawns a VM on system 0
   2. VM is migrated to system 1
   3. QEMU attempts to passthrough a device using PCI hotplug

In this scenario, the guest memory map is chosen at step (1), yet there
is no VFIO fd available to determine the reserved regions. Furthermore,
the reserved regions may vary between system 0 and system 1. This pretty
much rules out using VFIO to determine the reserved regions.Alex suggested
that the SMMU driver can advertise the regions via /sys/class/iommu/. This
would solve part of the problem, but migration between systems with
different memory maps can still cause problems if the reserved regions
of the new system conflict with the guest memory map chosen by QEMU.


OK so I understand we do not want anymore the VFIO chain capability API
(patch 5 of above series) but we prefer a sysfs approach instead.

Right.


I understand the sysfs approach which allows the userspace to get the
info earlier and independently on VFIO. Keeping in mind current QEMU
virt - which is not the only userspace - will not do much from this info
until we bring upheavals in virt address space management. So if I am
not wrong, at the moment the main action to be undertaken is the
rejection of the PCI hotplug in case we detect a collision.

I don't think so; it should be up to userspace to reject the hotplug.
If userspace doesn't have support for the regions, then that's fine --
you just end up in a situation where the CPU page table maps memory
somewhere that the device can't see. In other words, you'll end up with
spurious DMA failures, but that's exactly what happens with current systems
if you passthrough an overlapping region (Robin demonstrated this on Juno).

Additionally, you can imagine some future support where you can tell the
guest not to use certain regions of its memory for DMA. In this case, you
wouldn't want to refuse the hotplug in the case of overlapping regions.

Really, I think the kernel side just needs to enumerate the fixed reserved
regions, place the doorbell at a fixed address and then advertise these
via sysfs.


I can respin [1]
- studying and taking into account Robin's comments about dm_regions
similarities
- removing the VFIO capability chain and replacing this by a sysfs API

Ideally, this would be reusable between different SMMU drivers so the sysfs
entries have the same format etc.


Would that be OK?

Sounds good to me. Are you in a position to prototype something on the qemu
side once we've got kernel-side agreement?


What about Alex comments who wanted to report the usable memory ranges
instead of unusable memory ranges?

Also did you have a chance to discuss the following items:
1) the VFIO irq safety assessment

The discussion really focussed on system topology, as opposed to properties
of the doorbell. Regardless of how the device talks to the doorbell, if
the doorbell can't protect against things like MSI spoofing, then it's
unsafe. My opinion is that we shouldn't allow passthrough by default on
systems with unsafe doorbells (we could piggyback on allow_unsafe_interrupts
cmdline option to VFIO).

A first step would be making all this opt-in, and only supporting GICv3
ITS for now.

You're trying to support a config that is < GICv3 and no ITS ? ...
That would be the equiv. of x86 pre-intr-remap, and that's why 
allow_unsafe_interrupts
hook was created ... to enable devel/kick-the-tires.

2) the MSI reserved size computation (is an arbitrary size OK?)

If we fix the base address, we could fix a size too. However, we'd still
need to enumerate the doorbells to check that they fit in the region we
have. If not, then we can warn during boot and treat it the same way as
a resource conflict (that is, reallocate the region in some deterministic
way).

Will


___
iommu mailing list
iommu@lists.linux-foun

Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-08 Thread Will Deacon
On Tue, Nov 08, 2016 at 03:27:23PM +0100, Auger Eric wrote:
> On 08/11/2016 03:45, Will Deacon wrote:
> > Rather than treat these as separate problems, a better interface is to
> > tell userspace about a set of reserved regions, and have this include
> > the MSI doorbell, irrespective of whether or not it can be remapped.
> > Don suggested that we statically pick an address for the doorbell in a
> > similar way to x86, and have the kernel map it there. We could even pick
> > 0xfee0. If it conflicts with a reserved region on the platform (due
> > to (4)), then we'd obviously have to (deterministically?) allocate it
> > somewhere else, but probably within the bottom 4G.
> 
> This is tentatively achieved now with
> [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II
> (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html)

Yup, I saw that fly by. Hopefully some of the internals can be reused
with the current thinking on user ABI.

> > The next question is how to tell userspace about all of the reserved
> > regions. Initially, the idea was to extend VFIO, however Alex pointed
> > out a horrible scenario:
> > 
> >   1. QEMU spawns a VM on system 0
> >   2. VM is migrated to system 1
> >   3. QEMU attempts to passthrough a device using PCI hotplug
> > 
> > In this scenario, the guest memory map is chosen at step (1), yet there
> > is no VFIO fd available to determine the reserved regions. Furthermore,
> > the reserved regions may vary between system 0 and system 1. This pretty
> > much rules out using VFIO to determine the reserved regions.Alex suggested
> > that the SMMU driver can advertise the regions via /sys/class/iommu/. This
> > would solve part of the problem, but migration between systems with
> > different memory maps can still cause problems if the reserved regions
> > of the new system conflict with the guest memory map chosen by QEMU.
> 
> 
> OK so I understand we do not want anymore the VFIO chain capability API
> (patch 5 of above series) but we prefer a sysfs approach instead.

Right.

> I understand the sysfs approach which allows the userspace to get the
> info earlier and independently on VFIO. Keeping in mind current QEMU
> virt - which is not the only userspace - will not do much from this info
> until we bring upheavals in virt address space management. So if I am
> not wrong, at the moment the main action to be undertaken is the
> rejection of the PCI hotplug in case we detect a collision.

I don't think so; it should be up to userspace to reject the hotplug.
If userspace doesn't have support for the regions, then that's fine --
you just end up in a situation where the CPU page table maps memory
somewhere that the device can't see. In other words, you'll end up with
spurious DMA failures, but that's exactly what happens with current systems
if you passthrough an overlapping region (Robin demonstrated this on Juno).

Additionally, you can imagine some future support where you can tell the
guest not to use certain regions of its memory for DMA. In this case, you
wouldn't want to refuse the hotplug in the case of overlapping regions.

Really, I think the kernel side just needs to enumerate the fixed reserved
regions, place the doorbell at a fixed address and then advertise these
via sysfs.

> I can respin [1]
> - studying and taking into account Robin's comments about dm_regions
> similarities
> - removing the VFIO capability chain and replacing this by a sysfs API

Ideally, this would be reusable between different SMMU drivers so the sysfs
entries have the same format etc.

> Would that be OK?

Sounds good to me. Are you in a position to prototype something on the qemu
side once we've got kernel-side agreement?

> What about Alex comments who wanted to report the usable memory ranges
> instead of unusable memory ranges?
> 
> Also did you have a chance to discuss the following items:
> 1) the VFIO irq safety assessment

The discussion really focussed on system topology, as opposed to properties
of the doorbell. Regardless of how the device talks to the doorbell, if
the doorbell can't protect against things like MSI spoofing, then it's
unsafe. My opinion is that we shouldn't allow passthrough by default on
systems with unsafe doorbells (we could piggyback on allow_unsafe_interrupts
cmdline option to VFIO).

A first step would be making all this opt-in, and only supporting GICv3
ITS for now.

> 2) the MSI reserved size computation (is an arbitrary size OK?)

If we fix the base address, we could fix a size too. However, we'd still
need to enumerate the doorbells to check that they fit in the region we
have. If not, then we can warn during boot and treat it the same way as
a resource conflict (that is, reallocate the region in some deterministic
way).

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-08 Thread Don Dutile

On 11/07/2016 09:45 PM, Will Deacon wrote:

Hi all,

I figured this was a reasonable post to piggy-back on for the LPC minutes
relating to guest MSIs on arm64.

On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote:

We can always have QEMU reject hot-adding the device if the reserved
region overlaps existing guest RAM, but I don't even really see how we
advise users to give them a reasonable chance of avoiding that
possibility.  Apparently there are also ARM platforms where MSI pages
cannot be remapped to support the previous programmable user/VM
address, is it even worthwhile to support those platforms?  Does that
decision influence whether user programmable MSI reserved regions are
really a second class citizen to fixed reserved regions?  I expect
we'll be talking about this tomorrow morning, but I certainly haven't
come up with any viable solutions to this.  Thanks,


At LPC last week, we discussed guest MSIs on arm64 as part of the PCI
microconference. I presented some slides to illustrate some of the issues
we're trying to solve:

   http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf

Punit took some notes (thanks!) on the etherpad here:

   https://etherpad.openstack.org/p/LPC2016_PCI

although the discussion was pretty lively and jumped about, so I've had
to go from memory where the notes didn't capture everything that was
said.

To summarise, arm64 platforms differ in their handling of MSIs when compared
to x86:

   1. The physical memory map is not standardised (Jon pointed out that
  this is something that was realised late on)
   2. MSIs are usually treated the same as DMA writes, in that they must be
  mapped by the SMMU page tables so that they target a physical MSI
  doorbell
   3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI
  doorbell built into the PCI RC)

Chaulk this case to 'the learning curve'.
Q35 chipset (the one being use for x86-PCIe qemu model) had no intr-remap hw,
only DMA addrs destined for real memory. assigned-device intrs had to be caught
by kvm & injected into guests, and yes, a DoS was possible... and thus,
the intr-remap support being done after initial iommu support.



   4. Platforms typically have some set of addresses that abort before
  reaching the SMMU (e.g. because the PCI identifies them as P2P).

ARM platforms that don't implement the equivalent of ACS (in PCI bridges within
a PCIe switch) are either not device-assignment capable, or the IOMMU domain
expands across the entire peer-to-peer (sub-)tree.
ACS(-like) functionality is a fundamental component to the security model,
as is the IOMMU itself.  Without it, it's equivalent to not having an IOMMU.

Dare I ask?: Can these systems, or parts of these systems, just be deemed
"incomplete" or "not 100% secure" wrt device assignment, and other systems
can or will be ???
Not much different then the first x86 systems that tried to get it right
the first few times... :-/
I'm hearing (parts of) systems that are just not properly designed
for device-assignment use-case, probably b/c this (system) architecture
hasn't been pulled together from the variouis component architectures
(CPU, SMMU, IOT, etc.).



All of this means that userspace (QEMU) needs to identify the memory
regions corresponding to points (3) and (4) and ensure that they are
not allocated in the guest physical (IPA) space. For platforms that can
remap the MSI doorbell as in (2), then some space also needs to be
allocated for that.


Again, proper ACS features/control eliminates this need.
A (multi-function) device should never be able to perform
IO to itself via its PCIe interface.  Bridge-ACS pushes everything
up to SMMU for desitination resolution.
Without ACS, I don't see how a guest is migratible from one system to
another, unless the system-migration-group consists of system that
are exactly the same (wrt IO) [less/more system memory &/or cpu does
not affect VM system map.
 
Again, the initial Linux implementation did not have ACS,

but was 'resolved' by the default/common system mapping putting the PCI devices
into an area that was blocked from memory use (generally 3G->4G space).
ARM may not have that single, simple implementation, but a method to
indicated reserved regions, and then a check for matching/non-matching reserved
regions for guest migration, is the only way I see to resolve this issue
until ACS is sufficiently supported int the hw subsystems to be used
for device-assignment.


Rather than treat these as separate problems, a better interface is to
tell userspace about a set of reserved regions, and have this include
the MSI doorbell, irrespective of whether or not it can be remapped.
Don suggested that we statically pick an address for the doorbell in a
similar way to x86, and have the kernel map it there. We could even pick
0xfee0.

I suggest picking a 'relative-fixed' address: the last n-pages of system memory
address space, i.e.,
   0xfff[]fee0.  

Re: [v15, 3/7] powerpc/fsl: move mpc85xx.h to include/linux/fsl

2016-11-08 Thread Arnd Bergmann
On Tuesday, November 8, 2016 6:49:51 AM CET Y.B. Lu wrote:
> Hi Arnd,
> 
> 
> > -Original Message-
> > From: Arnd Bergmann [mailto:a...@arndb.de]
> > Sent: Tuesday, November 08, 2016 5:20 AM
> > To: Y.B. Lu
> > Cc: linuxppc-...@lists.ozlabs.org; linux-...@vger.kernel.org;
> > ulf.hans...@linaro.org; Scott Wood; Mark Rutland; Greg Kroah-Hartman; X.B.
> > Xie; M.H. Lian; linux-...@vger.kernel.org; linux-...@vger.kernel.org;
> > Qiang Zhao; Russell King; Bhupesh Sharma; Joerg Roedel; Claudiu Manoil;
> > devicet...@vger.kernel.org; Rob Herring; Santosh Shilimkar; linux-arm-
> > ker...@lists.infradead.org; net...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; Leo Li; iommu@lists.linux-foundation.org; Kumar
> > Gala
> > Subject: Re: [v15, 3/7] powerpc/fsl: move mpc85xx.h to include/linux/fsl
> > 
> > On Monday, October 31, 2016 9:35:33 AM CET Y.B. Lu wrote:
> > > >
> > > > I don't see any of the contents of this header referenced by the soc
> > > > driver any more. I think you can just drop this patch.
> > > >
> > >
> > > [Lu Yangbo-B47093] This header file was included by guts.c.
> > > The guts driver used macro SVR_MAJ/SVR_MIN for calculation.
> > >
> > > This header file was for powerpc arch before. And this patch is to
> > > made it as common header file for both ARM and PPC.
> > > Sooner or later this is needed.
> > 
> > Let's discuss it once we actually need the header then, ok?
> 
> [Lu Yangbo-B47093] As I said, this header file was included by guts.c in 
> patch 4.

Ah sorry, I misread your earlier reply, thinking you meant a potential
future patch.

> The guts driver used macro SVR_MAJ/SVR_MIN for calculation which were
> defined in this header file.
> Did you suggest we dropped this patch and just calculated them in driver?

That is probably nicer here: there is not that much value in sharing
the two one-line macro definitions, and the driver already hardcodes
the numeric per-chip IDs that make up most of the header file.

Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[git pull] IOMMU Fixes for Linux v4.9-rc4

2016-11-08 Thread Joerg Roedel
Hi Linus,

The following changes since commit bc33b0ca11e3df46a4fa7639ba488c9d4911:

  Linux 4.9-rc4 (2016-11-05 16:23:36 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fixes-v4.9-rc4

for you to fetch changes up to bea64033dd7b5fb6296eda8266acab6364ce1554:

  iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path (2016-11-08 15:08:26 
+0100)


IOMMU-Fixes for Linux v4.9-rc4

Two places need fixing:

* Four patches from Robin Murphy fix several issues with the
  recently merged generic DT-bindings support for arm-smmu
  drivers

* A fix for a dead-lock issue in the VT-d driver, which shows up
  on iommu hotplug


Joerg Roedel (1):
  iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path

Robin Murphy (4):
  iommu/arm-smmu: Work around ARM DMA configuration
  iommu/arm-smmu: Don't inadvertently reject multiple SMMUv3s
  iommu/arm-smmu: Check that iommu_fwspecs are ours
  iommu/arm-smmu: Fix out-of-bounds dereference

 drivers/iommu/arm-smmu-v3.c | 25 +
 drivers/iommu/arm-smmu.c| 16 ++--
 drivers/iommu/intel-iommu.c | 14 --
 3 files changed, 43 insertions(+), 12 deletions(-)

Please pull.

Thanks,

Joerg


signature.asc
Description: Digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 7/7] iommu/exynos: Use device dependency links to control runtime pm

2016-11-08 Thread Lukas Wunner
On Tue, Nov 08, 2016 at 08:27:12AM +0100, Marek Szyprowski wrote:
> On 2016-11-07 22:47, Luis R. Rodriguez wrote:
> > Has there been any review of the existing similar solutions out there
> > such as the DRM / audio component framework? Would that help ?
> 
> Nope, none of that solution deals with runtime pm.

Well, they do.  Hybrid graphics laptops often have an HDA controller
on the discrete GPU and I assume that's what Luis meant. There's code
in drivers/gpu/vga/vga_switcheroo.c to make this (only sort of) work:

* When the GPU is powered up/down, the HDA controller's driver is
  instructed to pm_runtime_get/put the HDA device (see call to
  set_audio_state() in vga_switcheroo_set_dynamic_switch()).

* When a runtime PM ref is acquired on the HDA device, the
  GPU is powered up (see vga_switcheroo_runtime_resume_hdmi_audio()).


Unfortunately this is all fairly broken, e.g.:

* If a runtime PM ref on the HDA device is held for more than 5 sec
  (autosuspend delay of the GPU), the GPU will be powered down and
  the HDA device will become inaccessible, regardless of the runtime
  PM ref still being held (because vga_switcheroo_set_dynamic_switch()
  doesn't check the refcount of the HDA device).

* The DRM device is afforded direct-complete but the HDA device is not.
  If the GPU is handled earlier by dpm_suspend(), then runtime PM will
  have been disabled on the GPU and thus the HDA device will fail to
  runtime resume before system sleep.

Rafael's series allows representation of such inter-device dependencies
in the PM core and can thus replace kludgy and broken "solutions" like
the one above.

There's one thing that I haven't understood myself though:  In an e-mail
exchange in September Rafael has argued that the above-mentioned hybrid
graphics use case "isn't a good [example] IMO.  That clearly is a case
when two (or more) devices share power resources controlled by a single
on/off switch.  Which is a clear use case for a PM domain."

The same seems to apply to Marek's SYSMMU use case.  When applying device
links to SYSMMU or hybrid graphics, we select one of the devices in the
PM domain as master and have the other one depend on it as slave, i.e.
a synthetic hierarchical relationship is established.

I've responded to Rafael on September 18 that this can't be solved with
a struct dev_pm_domain, but haven't received a reply since:
https://lkml.org/lkml/2016/9/18/103

Thanks,

Lukas
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma-iommu: properly respect configured address space size

2016-11-08 Thread Marek Szyprowski

Hi Robin,


On 2016-11-08 15:44, Robin Murphy wrote:

On 08/11/16 13:41, Marek Szyprowski wrote:

On 2016-11-08 12:37, Robin Murphy wrote:

On 07/11/16 13:06, Marek Szyprowski wrote:

When one called iommu_dma_init_domain() with size smaller than device's
DMA mask, the alloc_iova() will not respect it and always assume that
all
IOVA addresses will be allocated from the the (base ...
dev->dma_mask) range.

Is that actually a problem for anything?

Yes, I found this issue while working on next version of ARM & ARM64
DMA-mapping/IOMMU integration patchset and adapting Exynos drivers for the
new IOMMU/DMA-mapping glue.

Some Exynos devices (codec and camera isp) operate only on the limited (and
really small: 256M for example) DMA window. They use non-standard way of
addressing memory: an offset from the firmware base. However they still
have
32bit DMA mask, as the firmware can be located basically everywhere in the
real DMA address space, but then they can access only next 256M from that
firmware base.

OK, that's good to know, thanks. However, I think in this case it sounds
like it's really the DMA mask that's the underlying problem - if those
blocks themselves can only drive 28 address bits, then the struct
devices representing them should have 28-bit DMA masks, and the
"firmware base" of whoever's driving the upper bits modelled with a
dma_pfn_offset. Even if they do have full 32-bit interfaces themselves,
but are constrained to segment-offset addressing internally, I still
think it would be tidier to represent things that way.

At some point in dma-iommu development I did have support for DMA
offsets upstream of the IOMMU, and am happy to reinstate it if there's a
real use case (assuming you can't simply always set the firmware base to
0 when using the IOMMU).


That would indeed look a bit simpler, but I've already tried such approach
and the firmware crashes when its base in real DMA address space is set to
zero.


This patch fixes this issue by taking the configured address space size
parameter into account (if it is smaller than the device's dma_mask).

TBH I've been pondering ripping the size stuff out of dma-iommu, as it
all stems from me originally failing to understand what dma_32bit_pfn is
actually for. The truth is that iova_domains just don't have a size or
upper limit; however if devices with both large and small DMA masks
share a domain, then the top-down nature of the allocator means that
allocating for the less-capable devices would involve walking through
every out-of-range entry in the tree every time. Having cached32_node
based on dma_32bit_pfn just provides an optimised starting point for
searching within the smaller mask.

Right, this dma_32bit_pfn was really misleading at the first glance,
but then I found that this was something like end_pfn in case of dma-iommu
code.

Yes, that was my incorrect assumption - at the time I interpreted it as
a de-facto upper limit which was still possible to allocate above in
special circumstances, which turns out to be almost entirely backwards.
I'd rather not bake that into the dma-iommu code any further if we can
avoid it (I'll try throwing an RFC together to clear up what's there
already).


Okay.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma-iommu: properly respect configured address space size

2016-11-08 Thread Robin Murphy
On 08/11/16 13:41, Marek Szyprowski wrote:
> Hi Robin,
> 
> 
> On 2016-11-08 12:37, Robin Murphy wrote:
>> On 07/11/16 13:06, Marek Szyprowski wrote:
>>> When one called iommu_dma_init_domain() with size smaller than device's
>>> DMA mask, the alloc_iova() will not respect it and always assume that
>>> all
>>> IOVA addresses will be allocated from the the (base ...
>>> dev->dma_mask) range.
>> Is that actually a problem for anything?
> 
> Yes, I found this issue while working on next version of ARM & ARM64
> DMA-mapping/IOMMU integration patchset and adapting Exynos drivers for the
> new IOMMU/DMA-mapping glue.
> 
> Some Exynos devices (codec and camera isp) operate only on the limited (and
> really small: 256M for example) DMA window. They use non-standard way of
> addressing memory: an offset from the firmware base. However they still
> have
> 32bit DMA mask, as the firmware can be located basically everywhere in the
> real DMA address space, but then they can access only next 256M from that
> firmware base.

OK, that's good to know, thanks. However, I think in this case it sounds
like it's really the DMA mask that's the underlying problem - if those
blocks themselves can only drive 28 address bits, then the struct
devices representing them should have 28-bit DMA masks, and the
"firmware base" of whoever's driving the upper bits modelled with a
dma_pfn_offset. Even if they do have full 32-bit interfaces themselves,
but are constrained to segment-offset addressing internally, I still
think it would be tidier to represent things that way.

At some point in dma-iommu development I did have support for DMA
offsets upstream of the IOMMU, and am happy to reinstate it if there's a
real use case (assuming you can't simply always set the firmware base to
0 when using the IOMMU).

>>> This patch fixes this issue by taking the configured address space size
>>> parameter into account (if it is smaller than the device's dma_mask).
>> TBH I've been pondering ripping the size stuff out of dma-iommu, as it
>> all stems from me originally failing to understand what dma_32bit_pfn is
>> actually for. The truth is that iova_domains just don't have a size or
>> upper limit; however if devices with both large and small DMA masks
>> share a domain, then the top-down nature of the allocator means that
>> allocating for the less-capable devices would involve walking through
>> every out-of-range entry in the tree every time. Having cached32_node
>> based on dma_32bit_pfn just provides an optimised starting point for
>> searching within the smaller mask.
> 
> Right, this dma_32bit_pfn was really misleading at the first glance,
> but then I found that this was something like end_pfn in case of dma-iommu
> code.

Yes, that was my incorrect assumption - at the time I interpreted it as
a de-facto upper limit which was still possible to allocate above in
special circumstances, which turns out to be almost entirely backwards.
I'd rather not bake that into the dma-iommu code any further if we can
avoid it (I'll try throwing an RFC together to clear up what's there
already).

Robin.

>> Would it hurt any of your use-cases to relax/rework the reinitialisation
>> checks in iommu_dma_init_domain()? Alternatively if we really do have a
>> case for wanting a hard upper limit, it might make more sense to add an
>> end_pfn to the iova_domain and handle it in the allocator itself.
> 
> The proper support for end_pfn would be a better approach probably,
> especially if we consider readability of the code.
> 
> Best regards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-08 Thread Auger Eric
Hi Will,

On 08/11/2016 03:45, Will Deacon wrote:
> Hi all,
> 
> I figured this was a reasonable post to piggy-back on for the LPC minutes
> relating to guest MSIs on arm64.
> 
> On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote:
>> We can always have QEMU reject hot-adding the device if the reserved
>> region overlaps existing guest RAM, but I don't even really see how we
>> advise users to give them a reasonable chance of avoiding that
>> possibility.  Apparently there are also ARM platforms where MSI pages
>> cannot be remapped to support the previous programmable user/VM
>> address, is it even worthwhile to support those platforms?  Does that
>> decision influence whether user programmable MSI reserved regions are
>> really a second class citizen to fixed reserved regions?  I expect
>> we'll be talking about this tomorrow morning, but I certainly haven't
>> come up with any viable solutions to this.  Thanks,
> 
> At LPC last week, we discussed guest MSIs on arm64 as part of the PCI
> microconference. I presented some slides to illustrate some of the issues
> we're trying to solve:
> 
>   http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf
> 
> Punit took some notes (thanks!) on the etherpad here:
> 
>   https://etherpad.openstack.org/p/LPC2016_PCI

Thanks to both of you for the minutes and slides. Unfortunately I could
not travel but my ears were burning ;-)
> 
> although the discussion was pretty lively and jumped about, so I've had
> to go from memory where the notes didn't capture everything that was
> said.
> 
> To summarise, arm64 platforms differ in their handling of MSIs when compared
> to x86:
> 
>   1. The physical memory map is not standardised (Jon pointed out that
>  this is something that was realised late on)
>   2. MSIs are usually treated the same as DMA writes, in that they must be
>  mapped by the SMMU page tables so that they target a physical MSI
>  doorbell
>   3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI
>  doorbell built into the PCI RC)
>   4. Platforms typically have some set of addresses that abort before
>  reaching the SMMU (e.g. because the PCI identifies them as P2P).
> 
> All of this means that userspace (QEMU) needs to identify the memory
> regions corresponding to points (3) and (4) and ensure that they are
> not allocated in the guest physical (IPA) space. For platforms that can
> remap the MSI doorbell as in (2), then some space also needs to be
> allocated for that.
> 
> Rather than treat these as separate problems, a better interface is to
> tell userspace about a set of reserved regions, and have this include
> the MSI doorbell, irrespective of whether or not it can be remapped.
> Don suggested that we statically pick an address for the doorbell in a
> similar way to x86, and have the kernel map it there. We could even pick
> 0xfee0. If it conflicts with a reserved region on the platform (due
> to (4)), then we'd obviously have to (deterministically?) allocate it
> somewhere else, but probably within the bottom 4G.

This is tentatively achieved now with
[1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II
(http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html)
> 
> The next question is how to tell userspace about all of the reserved
> regions. Initially, the idea was to extend VFIO, however Alex pointed
> out a horrible scenario:
> 
>   1. QEMU spawns a VM on system 0
>   2. VM is migrated to system 1
>   3. QEMU attempts to passthrough a device using PCI hotplug
> 
> In this scenario, the guest memory map is chosen at step (1), yet there
> is no VFIO fd available to determine the reserved regions. Furthermore,
> the reserved regions may vary between system 0 and system 1. This pretty
> much rules out using VFIO to determine the reserved regions.Alex suggested
> that the SMMU driver can advertise the regions via /sys/class/iommu/. This
> would solve part of the problem, but migration between systems with
> different memory maps can still cause problems if the reserved regions
> of the new system conflict with the guest memory map chosen by QEMU.


OK so I understand we do not want anymore the VFIO chain capability API
(patch 5 of above series) but we prefer a sysfs approach instead.

I understand the sysfs approach which allows the userspace to get the
info earlier and independently on VFIO. Keeping in mind current QEMU
virt - which is not the only userspace - will not do much from this info
until we bring upheavals in virt address space management. So if I am
not wrong, at the moment the main action to be undertaken is the
rejection of the PCI hotplug in case we detect a collision.

I can respin [1]
- studying and taking into account Robin's comments about dm_regions
similarities
- removing the VFIO capability chain and replacing this by a sysfs API

Would that be OK?

What about Alex comments who wanted to report the usable memory ranges
instead of u

[PATCH] iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path

2016-11-08 Thread Joerg Roedel
On Fri, Nov 04, 2016 at 10:38:29AM +0100, Iago Abal wrote:
> That patch was actually my first attempt at fixing the problem, but I
> ran the tool and I found a second possibility of deadlock:
> `domain_exit' calls to `domain_remove_dev_info', which also spin_locks
> on `device_domain_lock'.
> 
> Alternatively I could add another `__domain_exit' function that
> doesn't take the lock.

So that is actually not easy to do, I'd rather go with the simpler
solution of dropping the lock for domain_exit() invocation and then
re-start walking the list afterwards, like in the below patch:

>From bea64033dd7b5fb6296eda8266acab6364ce1554 Mon Sep 17 00:00:00 2001
From: Joerg Roedel 
Date: Tue, 8 Nov 2016 15:08:26 +0100
Subject: [PATCH] iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path

It turns out that the disable_dmar_iommu() code-path tried
to get the device_domain_lock recursivly, which will
dead-lock when this code runs on dmar removal. Fix both
code-paths that could lead to the dead-lock.

Fixes: 55d940430ab9 ('iommu/vt-d: Get rid of domain->iommu_lock')
Reported-by: Iago Abal 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a4407ea..3965e73 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1711,6 +1711,7 @@ static void disable_dmar_iommu(struct intel_iommu *iommu)
if (!iommu->domains || !iommu->domain_ids)
return;
 
+again:
spin_lock_irqsave(&device_domain_lock, flags);
list_for_each_entry_safe(info, tmp, &device_domain_list, global) {
struct dmar_domain *domain;
@@ -1723,10 +1724,19 @@ static void disable_dmar_iommu(struct intel_iommu 
*iommu)
 
domain = info->domain;
 
-   dmar_remove_one_dev_info(domain, info->dev);
+   __dmar_remove_one_dev_info(info);
 
-   if (!domain_type_is_vm_or_si(domain))
+   if (!domain_type_is_vm_or_si(domain)) {
+   /*
+* The domain_exit() function  can't be called under
+* device_domain_lock, as it takes this lock itself.
+* So release the lock here and re-run the loop
+* afterwards.
+*/
+   spin_unlock_irqrestore(&device_domain_lock, flags);
domain_exit(domain);
+   goto again;
+   }
}
spin_unlock_irqrestore(&device_domain_lock, flags);
 
-- 
2.6.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma-iommu: properly respect configured address space size

2016-11-08 Thread Marek Szyprowski

Hi Robin,


On 2016-11-08 12:37, Robin Murphy wrote:

On 07/11/16 13:06, Marek Szyprowski wrote:

When one called iommu_dma_init_domain() with size smaller than device's
DMA mask, the alloc_iova() will not respect it and always assume that all
IOVA addresses will be allocated from the the (base ... dev->dma_mask) range.

Is that actually a problem for anything?


Yes, I found this issue while working on next version of ARM & ARM64
DMA-mapping/IOMMU integration patchset and adapting Exynos drivers for the
new IOMMU/DMA-mapping glue.

Some Exynos devices (codec and camera isp) operate only on the limited (and
really small: 256M for example) DMA window. They use non-standard way of
addressing memory: an offset from the firmware base. However they still have
32bit DMA mask, as the firmware can be located basically everywhere in the
real DMA address space, but then they can access only next 256M from that
firmware base.




This patch fixes this issue by taking the configured address space size
parameter into account (if it is smaller than the device's dma_mask).

TBH I've been pondering ripping the size stuff out of dma-iommu, as it
all stems from me originally failing to understand what dma_32bit_pfn is
actually for. The truth is that iova_domains just don't have a size or
upper limit; however if devices with both large and small DMA masks
share a domain, then the top-down nature of the allocator means that
allocating for the less-capable devices would involve walking through
every out-of-range entry in the tree every time. Having cached32_node
based on dma_32bit_pfn just provides an optimised starting point for
searching within the smaller mask.


Right, this dma_32bit_pfn was really misleading at the first glance,
but then I found that this was something like end_pfn in case of dma-iommu
code.


Would it hurt any of your use-cases to relax/rework the reinitialisation
checks in iommu_dma_init_domain()? Alternatively if we really do have a
case for wanting a hard upper limit, it might make more sense to add an
end_pfn to the iova_domain and handle it in the allocator itself.


The proper support for end_pfn would be a better approach probably,
especially if we consider readability of the code.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 2/7] iommu/exynos: Remove dead code

2016-11-08 Thread Marek Szyprowski
__sysmmu_enable/disable functions were designed to do ref-count based
operations, but current code always calls them only once, so the code for
checking the conditions and invalid conditions can be simply removed
without any influence to the driver operation.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/exynos-iommu.c | 65 
 1 file changed, 17 insertions(+), 48 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 8ba0d60..4056228 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -460,9 +460,6 @@ static bool __sysmmu_disable(struct sysmmu_drvdata *data)
__sysmmu_disable_nocount(data);
 
dev_dbg(data->sysmmu, "Disabled\n");
-   } else  {
-   dev_dbg(data->sysmmu, "%d times left to disable\n",
-   data->activations);
}
 
spin_unlock_irqrestore(&data->lock, flags);
@@ -508,29 +505,18 @@ static void __sysmmu_enable_nocount(struct sysmmu_drvdata 
*data)
 static int __sysmmu_enable(struct sysmmu_drvdata *data, phys_addr_t pgtable,
   struct exynos_iommu_domain *domain)
 {
-   int ret = 0;
unsigned long flags;
 
spin_lock_irqsave(&data->lock, flags);
if (set_sysmmu_active(data)) {
data->pgtable = pgtable;
data->domain = domain;
-
__sysmmu_enable_nocount(data);
-
dev_dbg(data->sysmmu, "Enabled\n");
-   } else {
-   ret = (pgtable == data->pgtable) ? 1 : -EBUSY;
-
-   dev_dbg(data->sysmmu, "already enabled\n");
}
-
-   if (WARN_ON(ret < 0))
-   set_sysmmu_inactive(data); /* decrement count */
-
spin_unlock_irqrestore(&data->lock, flags);
 
-   return ret;
+   return 0;
 }
 
 static void sysmmu_tlb_invalidate_flpdcache(struct sysmmu_drvdata *data,
@@ -793,8 +779,8 @@ static void exynos_iommu_domain_free(struct iommu_domain 
*iommu_domain)
spin_lock_irqsave(&domain->lock, flags);
 
list_for_each_entry_safe(data, next, &domain->clients, domain_node) {
-   if (__sysmmu_disable(data))
-   data->master = NULL;
+   __sysmmu_disable(data);
+   data->master = NULL;
list_del_init(&data->domain_node);
}
 
@@ -829,31 +815,23 @@ static void exynos_iommu_detach_device(struct 
iommu_domain *iommu_domain,
phys_addr_t pagetable = virt_to_phys(domain->pgtable);
struct sysmmu_drvdata *data, *next;
unsigned long flags;
-   bool found = false;
 
if (!has_sysmmu(dev) || owner->domain != iommu_domain)
return;
 
spin_lock_irqsave(&domain->lock, flags);
list_for_each_entry_safe(data, next, &domain->clients, domain_node) {
-   if (data->master == dev) {
-   if (__sysmmu_disable(data)) {
-   data->master = NULL;
-   list_del_init(&data->domain_node);
-   }
-   pm_runtime_put(data->sysmmu);
-   found = true;
-   }
+   __sysmmu_disable(data);
+   data->master = NULL;
+   list_del_init(&data->domain_node);
+   pm_runtime_put(data->sysmmu);
}
spin_unlock_irqrestore(&domain->lock, flags);
 
owner->domain = NULL;
 
-   if (found)
-   dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n",
-   __func__, &pagetable);
-   else
-   dev_err(dev, "%s: No IOMMU is attached\n", __func__);
+   dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n", __func__,
+   &pagetable);
 }
 
 static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain,
@@ -864,7 +842,6 @@ static int exynos_iommu_attach_device(struct iommu_domain 
*iommu_domain,
struct sysmmu_drvdata *data;
phys_addr_t pagetable = virt_to_phys(domain->pgtable);
unsigned long flags;
-   int ret = -ENODEV;
 
if (!has_sysmmu(dev))
return -ENODEV;
@@ -874,27 +851,19 @@ static int exynos_iommu_attach_device(struct iommu_domain 
*iommu_domain,
 
list_for_each_entry(data, &owner->controllers, owner_node) {
pm_runtime_get_sync(data->sysmmu);
-   ret = __sysmmu_enable(data, pagetable, domain);
-   if (ret >= 0) {
-   data->master = dev;
+   __sysmmu_enable(data, pagetable, domain);
+   data->master = dev;
 
-   spin_lock_irqsave(&domain->lock, flags);
-   list_add_tail(&data->domain_node, &domain->clients);
-   spin_unlock_irqrestore(&domain->lock, flags);
-   }
-   }
-
-   if (ret < 0) {
-   dev_err(dev

[PATCH v6 7/7] iommu/exynos: Use device dependency links to control runtime pm

2016-11-08 Thread Marek Szyprowski
This patch uses recently introduced device dependency links to track the
runtime pm state of the master's device. The goal is to let SYSMMU
controller device's runtime PM to follow the runtime PM state of the
respective master's device. This way each SYSMMU controller is active
only when its master's device is active and can properly restore or save
its state instead on runtime PM transition of master's device.
This approach replaces old behavior, when SYSMMU controller was set to
runtime active once after attaching to the master device. In the new
approach SYSMMU controllers no longer prevents respective power domains
to be turned off when master's device is not being used.

The dependency links also enforces proper order of suspending/restoring
devices during system sleep transition, so there is no more need to use
LATE_SYSTEM_SLEEP_PM_OPS-based workaround for ensuring that SYSMMUs are
suspended after their master devices.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/exynos-iommu.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 5e6d7bb..4b05a15 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -633,8 +633,8 @@ static int __maybe_unused exynos_sysmmu_resume(struct 
device *dev)
 
 static const struct dev_pm_ops sysmmu_pm_ops = {
SET_RUNTIME_PM_OPS(exynos_sysmmu_suspend, exynos_sysmmu_resume, NULL)
-   SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
-pm_runtime_force_resume)
+   SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+   pm_runtime_force_resume)
 };
 
 static const struct of_device_id sysmmu_of_match[] __initconst = {
@@ -781,10 +781,6 @@ static void exynos_iommu_detach_device(struct iommu_domain 
*iommu_domain,
if (!has_sysmmu(dev) || owner->domain != iommu_domain)
return;
 
-   list_for_each_entry(data, &owner->controllers, owner_node) {
-   pm_runtime_put_sync(data->sysmmu);
-   }
-
mutex_lock(&owner->rpm_lock);
 
list_for_each_entry(data, &owner->controllers, owner_node) {
@@ -848,10 +844,6 @@ static int exynos_iommu_attach_device(struct iommu_domain 
*iommu_domain,
 
mutex_unlock(&owner->rpm_lock);
 
-   list_for_each_entry(data, &owner->controllers, owner_node) {
-   pm_runtime_get_sync(data->sysmmu);
-   }
-
dev_dbg(dev, "%s: Attached IOMMU with pgtable %pa\n", __func__,
&pagetable);
 
@@ -1232,6 +1224,14 @@ static int exynos_iommu_of_xlate(struct device *dev,
 
list_add_tail(&data->owner_node, &owner->controllers);
data->master = dev;
+
+   /*
+* SYSMMU will be runtime activated via device link (dependency) to its
+* master device, so there are no direct calls to pm_runtime_get/put
+* in this driver.
+*/
+   device_link_add(dev, data->sysmmu, DL_FLAG_PM_RUNTIME);
+
return 0;
 }
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 3/7] iommu/exynos: Simplify internal enable/disable functions

2016-11-08 Thread Marek Szyprowski
Remove remaining leftovers of the ref-count related code in the
__sysmmu_enable/disable functions inline __sysmmu_enable/disable_nocount
to them. Suspend/resume callbacks now checks if master device is set for
given SYSMMU controller instead of relying on the activation count.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/exynos-iommu.c | 104 ---
 1 file changed, 29 insertions(+), 75 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 4056228..f45b274 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -237,8 +237,8 @@ struct sysmmu_drvdata {
struct clk *aclk;   /* SYSMMU's aclk clock */
struct clk *pclk;   /* SYSMMU's pclk clock */
struct clk *clk_master; /* master's device clock */
-   int activations;/* number of calls to sysmmu_enable */
spinlock_t lock;/* lock for modyfying state */
+   int active; /* current status */
struct exynos_iommu_domain *domain; /* domain we belong to */
struct list_head domain_node;   /* node for domain clients list */
struct list_head owner_node;/* node for owner controllers list */
@@ -251,25 +251,6 @@ static struct exynos_iommu_domain *to_exynos_domain(struct 
iommu_domain *dom)
return container_of(dom, struct exynos_iommu_domain, domain);
 }
 
-static bool set_sysmmu_active(struct sysmmu_drvdata *data)
-{
-   /* return true if the System MMU was not active previously
-  and it needs to be initialized */
-   return ++data->activations == 1;
-}
-
-static bool set_sysmmu_inactive(struct sysmmu_drvdata *data)
-{
-   /* return true if the System MMU is needed to be disabled */
-   BUG_ON(data->activations < 1);
-   return --data->activations == 0;
-}
-
-static bool is_sysmmu_active(struct sysmmu_drvdata *data)
-{
-   return data->activations > 0;
-}
-
 static void sysmmu_unblock(struct sysmmu_drvdata *data)
 {
writel(CTRL_ENABLE, data->sfrbase + REG_MMU_CTRL);
@@ -388,7 +369,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
unsigned short reg_status, reg_clear;
int ret = -ENOSYS;
 
-   WARN_ON(!is_sysmmu_active(data));
+   WARN_ON(!data->active);
 
if (MMU_MAJ_VER(data->version) < 5) {
reg_status = REG_INT_STATUS;
@@ -434,37 +415,19 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
-static void __sysmmu_disable_nocount(struct sysmmu_drvdata *data)
+static void __sysmmu_disable(struct sysmmu_drvdata *data)
 {
+   unsigned long flags;
+
clk_enable(data->clk_master);
 
+   spin_lock_irqsave(&data->lock, flags);
writel(CTRL_DISABLE, data->sfrbase + REG_MMU_CTRL);
writel(0, data->sfrbase + REG_MMU_CFG);
-
-   __sysmmu_disable_clocks(data);
-}
-
-static bool __sysmmu_disable(struct sysmmu_drvdata *data)
-{
-   bool disabled;
-   unsigned long flags;
-
-   spin_lock_irqsave(&data->lock, flags);
-
-   disabled = set_sysmmu_inactive(data);
-
-   if (disabled) {
-   data->pgtable = 0;
-   data->domain = NULL;
-
-   __sysmmu_disable_nocount(data);
-
-   dev_dbg(data->sysmmu, "Disabled\n");
-   }
-
+   data->active = false;
spin_unlock_irqrestore(&data->lock, flags);
 
-   return disabled;
+   __sysmmu_disable_clocks(data);
 }
 
 static void __sysmmu_init_config(struct sysmmu_drvdata *data)
@@ -481,17 +444,19 @@ static void __sysmmu_init_config(struct sysmmu_drvdata 
*data)
writel(cfg, data->sfrbase + REG_MMU_CFG);
 }
 
-static void __sysmmu_enable_nocount(struct sysmmu_drvdata *data)
+static void __sysmmu_enable(struct sysmmu_drvdata *data)
 {
+   unsigned long flags;
+
__sysmmu_enable_clocks(data);
 
+   spin_lock_irqsave(&data->lock, flags);
writel(CTRL_BLOCK, data->sfrbase + REG_MMU_CTRL);
-
__sysmmu_init_config(data);
-
__sysmmu_set_ptbase(data, data->pgtable);
-
writel(CTRL_ENABLE, data->sfrbase + REG_MMU_CTRL);
+   data->active = true;
+   spin_unlock_irqrestore(&data->lock, flags);
 
/*
 * SYSMMU driver keeps master's clock enabled only for the short
@@ -502,37 +467,18 @@ static void __sysmmu_enable_nocount(struct sysmmu_drvdata 
*data)
clk_disable(data->clk_master);
 }
 
-static int __sysmmu_enable(struct sysmmu_drvdata *data, phys_addr_t pgtable,
-  struct exynos_iommu_domain *domain)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(&data->lock, flags);
-   if (set_sysmmu_active(data)) {
-   data->pgtable = pgtable;
-   data->domain = domain;
-   __sysmmu_enable_nocount(data);
-   dev_dbg(data->sysmmu, "Enabled\n");
-   }
-   spin_unlock_irqrestore(&data-

[PATCH v6 6/7] iommu/exynos: Add runtime pm support

2016-11-08 Thread Marek Szyprowski
This patch adds runtime pm implementation, which is based on previous
suspend/resume code. SYSMMU controller is now being enabled/disabled mainly
from the runtime pm callbacks. System sleep callbacks relies on generic
pm_runtime_force_suspend/pm_runtime_force_resume helpers. To ensure
internal state consistency, additional lock for runtime pm transitions
was introduced.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/exynos-iommu.c | 45 +++-
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index a959443..5e6d7bb 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -206,6 +206,7 @@ struct sysmmu_fault_info {
 struct exynos_iommu_owner {
struct list_head controllers;   /* list of sysmmu_drvdata.owner_node */
struct iommu_domain *domain;/* domain this device is attached */
+   struct mutex rpm_lock;  /* for runtime pm of all sysmmus */
 };
 
 /*
@@ -594,40 +595,46 @@ static int __init exynos_sysmmu_probe(struct 
platform_device *pdev)
return 0;
 }
 
-#ifdef CONFIG_PM_SLEEP
-static int exynos_sysmmu_suspend(struct device *dev)
+static int __maybe_unused exynos_sysmmu_suspend(struct device *dev)
 {
struct sysmmu_drvdata *data = dev_get_drvdata(dev);
struct device *master = data->master;
 
if (master) {
-   pm_runtime_put(dev);
+   struct exynos_iommu_owner *owner = master->archdata.iommu;
+
+   mutex_lock(&owner->rpm_lock);
if (data->domain) {
dev_dbg(data->sysmmu, "saving state\n");
__sysmmu_disable(data);
}
+   mutex_unlock(&owner->rpm_lock);
}
return 0;
 }
 
-static int exynos_sysmmu_resume(struct device *dev)
+static int __maybe_unused exynos_sysmmu_resume(struct device *dev)
 {
struct sysmmu_drvdata *data = dev_get_drvdata(dev);
struct device *master = data->master;
 
if (master) {
-   pm_runtime_get_sync(dev);
+   struct exynos_iommu_owner *owner = master->archdata.iommu;
+
+   mutex_lock(&owner->rpm_lock);
if (data->domain) {
dev_dbg(data->sysmmu, "restoring state\n");
__sysmmu_enable(data);
}
+   mutex_unlock(&owner->rpm_lock);
}
return 0;
 }
-#endif
 
 static const struct dev_pm_ops sysmmu_pm_ops = {
-   SET_LATE_SYSTEM_SLEEP_PM_OPS(exynos_sysmmu_suspend, 
exynos_sysmmu_resume)
+   SET_RUNTIME_PM_OPS(exynos_sysmmu_suspend, exynos_sysmmu_resume, NULL)
+   SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+pm_runtime_force_resume)
 };
 
 static const struct of_device_id sysmmu_of_match[] __initconst = {
@@ -775,7 +782,15 @@ static void exynos_iommu_detach_device(struct iommu_domain 
*iommu_domain,
return;
 
list_for_each_entry(data, &owner->controllers, owner_node) {
-   __sysmmu_disable(data);
+   pm_runtime_put_sync(data->sysmmu);
+   }
+
+   mutex_lock(&owner->rpm_lock);
+
+   list_for_each_entry(data, &owner->controllers, owner_node) {
+   pm_runtime_get_noresume(data->sysmmu);
+   if (pm_runtime_active(data->sysmmu))
+   __sysmmu_disable(data);
pm_runtime_put(data->sysmmu);
}
 
@@ -790,6 +805,7 @@ static void exynos_iommu_detach_device(struct iommu_domain 
*iommu_domain,
owner->domain = NULL;
spin_unlock_irqrestore(&domain->lock, flags);
 
+   mutex_unlock(&owner->rpm_lock);
 
dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n", __func__,
&pagetable);
@@ -810,6 +826,8 @@ static int exynos_iommu_attach_device(struct iommu_domain 
*iommu_domain,
if (owner->domain)
exynos_iommu_detach_device(owner->domain, dev);
 
+   mutex_lock(&owner->rpm_lock);
+
spin_lock_irqsave(&domain->lock, flags);
list_for_each_entry(data, &owner->controllers, owner_node) {
spin_lock(&data->lock);
@@ -822,8 +840,16 @@ static int exynos_iommu_attach_device(struct iommu_domain 
*iommu_domain,
spin_unlock_irqrestore(&domain->lock, flags);
 
list_for_each_entry(data, &owner->controllers, owner_node) {
+   pm_runtime_get_noresume(data->sysmmu);
+   if (pm_runtime_active(data->sysmmu))
+   __sysmmu_enable(data);
+   pm_runtime_put(data->sysmmu);
+   }
+
+   mutex_unlock(&owner->rpm_lock);
+
+   list_for_each_entry(data, &owner->controllers, owner_node) {
pm_runtime_get_sync(data->sysmmu);
-   __sysmmu_enable(data);
}
 
dev_dbg(dev, "%s: Attached IOMMU with pgtable %pa\n", __func__,
@@ -1200,6 +1226,7 @@ static int

[PATCH v6 4/7] iommu/exynos: Set master device once on boot

2016-11-08 Thread Marek Szyprowski
To avoid possible races, set master device pointer in each SYSMMU
controller once on boot. Suspend/resume callbacks now properly relies on
the configured iommu domain to enable or disable SYSMMU controller.
While changing the code, also update the sleep debug messages and make
them conditional.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/exynos-iommu.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index f45b274..28e570b 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -600,10 +600,12 @@ static int exynos_sysmmu_suspend(struct device *dev)
struct sysmmu_drvdata *data = dev_get_drvdata(dev);
struct device *master = data->master;
 
-   dev_dbg(dev, "suspend\n");
if (master) {
-   __sysmmu_disable(data);
pm_runtime_put(dev);
+   if (data->domain) {
+   dev_dbg(data->sysmmu, "saving state\n");
+   __sysmmu_disable(data);
+   }
}
return 0;
 }
@@ -613,10 +615,12 @@ static int exynos_sysmmu_resume(struct device *dev)
struct sysmmu_drvdata *data = dev_get_drvdata(dev);
struct device *master = data->master;
 
-   dev_dbg(dev, "resume\n");
if (master) {
pm_runtime_get_sync(dev);
-   __sysmmu_enable(data);
+   if (data->domain) {
+   dev_dbg(data->sysmmu, "restoring state\n");
+   __sysmmu_enable(data);
+   }
}
return 0;
 }
@@ -730,7 +734,6 @@ static void exynos_iommu_domain_free(struct iommu_domain 
*iommu_domain)
__sysmmu_disable(data);
data->pgtable = 0;
data->domain = NULL;
-   data->master = NULL;
list_del_init(&data->domain_node);
}
 
@@ -772,7 +775,6 @@ static void exynos_iommu_detach_device(struct iommu_domain 
*iommu_domain,
spin_lock_irqsave(&domain->lock, flags);
list_for_each_entry_safe(data, next, &domain->clients, domain_node) {
__sysmmu_disable(data);
-   data->master = NULL;
data->pgtable = 0;
data->domain = NULL;
list_del_init(&data->domain_node);
@@ -806,7 +808,6 @@ static int exynos_iommu_attach_device(struct iommu_domain 
*iommu_domain,
data->domain = domain;
pm_runtime_get_sync(data->sysmmu);
__sysmmu_enable(data);
-   data->master = dev;
 
spin_lock_irqsave(&domain->lock, flags);
list_add_tail(&data->domain_node, &domain->clients);
@@ -1192,6 +1193,7 @@ static int exynos_iommu_of_xlate(struct device *dev,
}
 
list_add_tail(&data->owner_node, &owner->controllers);
+   data->master = dev;
return 0;
 }
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 5/7] iommu/exynos: Rework and fix internal locking

2016-11-08 Thread Marek Szyprowski
This patch reworks locking in the exynos_iommu_attach/detach_device
functions to ensure that all entries of the sysmmu_drvdata and
exynos_iommu_owner structure are updated under the respective spinlocks,
while runtime pm functions are called without any spinlocks held.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/exynos-iommu.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 28e570b..a959443 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -731,10 +731,12 @@ static void exynos_iommu_domain_free(struct iommu_domain 
*iommu_domain)
spin_lock_irqsave(&domain->lock, flags);
 
list_for_each_entry_safe(data, next, &domain->clients, domain_node) {
+   spin_lock(&data->lock);
__sysmmu_disable(data);
data->pgtable = 0;
data->domain = NULL;
list_del_init(&data->domain_node);
+   spin_unlock(&data->lock);
}
 
spin_unlock_irqrestore(&domain->lock, flags);
@@ -772,17 +774,22 @@ static void exynos_iommu_detach_device(struct 
iommu_domain *iommu_domain,
if (!has_sysmmu(dev) || owner->domain != iommu_domain)
return;
 
+   list_for_each_entry(data, &owner->controllers, owner_node) {
+   __sysmmu_disable(data);
+   pm_runtime_put(data->sysmmu);
+   }
+
spin_lock_irqsave(&domain->lock, flags);
list_for_each_entry_safe(data, next, &domain->clients, domain_node) {
-   __sysmmu_disable(data);
+   spin_lock(&data->lock);
data->pgtable = 0;
data->domain = NULL;
list_del_init(&data->domain_node);
-   pm_runtime_put(data->sysmmu);
+   spin_unlock(&data->lock);
}
+   owner->domain = NULL;
spin_unlock_irqrestore(&domain->lock, flags);
 
-   owner->domain = NULL;
 
dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n", __func__,
&pagetable);
@@ -803,18 +810,22 @@ static int exynos_iommu_attach_device(struct iommu_domain 
*iommu_domain,
if (owner->domain)
exynos_iommu_detach_device(owner->domain, dev);
 
+   spin_lock_irqsave(&domain->lock, flags);
list_for_each_entry(data, &owner->controllers, owner_node) {
+   spin_lock(&data->lock);
data->pgtable = pagetable;
data->domain = domain;
+   list_add_tail(&data->domain_node, &domain->clients);
+   spin_unlock(&data->lock);
+   }
+   owner->domain = iommu_domain;
+   spin_unlock_irqrestore(&domain->lock, flags);
+
+   list_for_each_entry(data, &owner->controllers, owner_node) {
pm_runtime_get_sync(data->sysmmu);
__sysmmu_enable(data);
-
-   spin_lock_irqsave(&domain->lock, flags);
-   list_add_tail(&data->domain_node, &domain->clients);
-   spin_unlock_irqrestore(&domain->lock, flags);
}
 
-   owner->domain = iommu_domain;
dev_dbg(dev, "%s: Attached IOMMU with pgtable %pa\n", __func__,
&pagetable);
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 0/7] Exynos IOMMU: proper runtime PM support (use device dependencies)

2016-11-08 Thread Marek Szyprowski
Hello,

This is another update of the patchset for adding proper runtime PM
support to Exynos IOMMU driver. This has been achieved by using recently
introduced device links, which lets SYSMMU controller's runtime PM to
follow master's device runtime PM state (the device which actually
performs DMA transaction). The main idea behind this solution is an
observation that any DMA activity from master device can be done only
when master device is active, thus when master device is suspended
SYSMMU controller device can also be suspended.

This patchset solves the problem of all power domains being always
enabled. It happened, because all SYSMMU controllers (which belongs to
the same domains) are permanently kept active, because existing driver
was simplified and kept SYSMMU device active all the time after
initialization and attaching to the master device.

This patchset is based on sixth version of Rafael's "Functional dependencies
between devices" patchset [1], which has been merged to Greg's driver-core-next
branch [2] (last patch commit id is baa8809f60971d10220dfe79248f54b2b265f003).
As Greg pointed, the branch will not be rebased and can be used as a base
for applying my patchset [3].

Joerg: I hope you can merge this version on top of Greg's driver-core-next
branch to iommu tree.

[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1261311.html
[2] git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 
driver-core-next
[3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264906.html

If one wants to test this patchset, I've provided a branch with all needed
patches (a small fix for Exynos4 FIMC-IS DTS is still needed, it is already
in samsung tree):
https://git.linaro.org/people/marek.szyprowski/linux-srpol.git v4.9-iommu-pm-v6

Best regards
Marek Szyprowski
Samsung R&D Institute Poland


Changelog:
v6:
- removed LATE_SYSTEM_SLEEP_PM_OPS-based workaround, because it is no longer
  needed after introducing device links (they also take care of proper system
  sleep suspend/resume sequence)
- updated some comments

v5: https://lkml.org/lkml/2016/10/20/70
- split main patch into several small changes for easier review (requested
  by Luis Rodriquez)
- fixed usage of runtime_pm_active, now it is guarded by 
pm_runtime_get_noresume()
  and pm_runtime_put() pair

v4: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1241601.html
- rebased on top of v4 of device dependencies/links patchset, what resolved
  system hang on reboot

v3: http://www.spinics.net/lists/linux-samsung-soc/msg55256.html
- rebased on top of latest device dependencies/links patchset
- added proper locking between runtime pm, iommu_attach/detach and sysmmu
  enable/disable(added per iommu owner device's rpm lock)

v2: http://www.spinics.net/lists/arm-kernel/msg512082.html
- replaced PM notifiers with generic device dependencies/links developed
  by Rafael J. Wysocki

v1: http://www.spinics.net/lists/arm-kernel/msg509600.html
- initial version


Patch summary:

Marek Szyprowski (7):
  iommu/exynos: Remove excessive, useless debug
  iommu/exynos: Remove dead code
  iommu/exynos: Simplify internal enable/disable functions
  iommu/exynos: Set master device once on boot
  iommu/exynos: Rework and fix internal locking
  iommu/exynos: Add runtime pm support
  iommu/exynos: Use device dependency links to control runtime pm

 drivers/iommu/exynos-iommu.c | 230 ++-
 1 file changed, 95 insertions(+), 135 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 1/7] iommu/exynos: Remove excessive, useless debug

2016-11-08 Thread Marek Szyprowski
Remove excessive, useless debug about skipping TLB invalidation, which
is a normal situation when more aggressive power management is enabled.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/exynos-iommu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 30808e9..8ba0d60 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -578,9 +578,6 @@ static void sysmmu_tlb_invalidate_entry(struct 
sysmmu_drvdata *data,
sysmmu_unblock(data);
}
clk_disable(data->clk_master);
-   } else {
-   dev_dbg(data->master,
-   "disabled. Skipping TLB invalidation @ %#x\n", iova);
}
spin_unlock_irqrestore(&data->lock, flags);
 }
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma-iommu: properly respect configured address space size

2016-11-08 Thread Robin Murphy
Hi Marek,

On 07/11/16 13:06, Marek Szyprowski wrote:
> When one called iommu_dma_init_domain() with size smaller than device's
> DMA mask, the alloc_iova() will not respect it and always assume that all
> IOVA addresses will be allocated from the the (base ... dev->dma_mask) range.

Is that actually a problem for anything?

> This patch fixes this issue by taking the configured address space size
> parameter into account (if it is smaller than the device's dma_mask).

TBH I've been pondering ripping the size stuff out of dma-iommu, as it
all stems from me originally failing to understand what dma_32bit_pfn is
actually for. The truth is that iova_domains just don't have a size or
upper limit; however if devices with both large and small DMA masks
share a domain, then the top-down nature of the allocator means that
allocating for the less-capable devices would involve walking through
every out-of-range entry in the tree every time. Having cached32_node
based on dma_32bit_pfn just provides an optimised starting point for
searching within the smaller mask.

Would it hurt any of your use-cases to relax/rework the reinitialisation
checks in iommu_dma_init_domain()? Alternatively if we really do have a
case for wanting a hard upper limit, it might make more sense to add an
end_pfn to the iova_domain and handle it in the allocator itself.

Robin.

> Signed-off-by: Marek Szyprowski 
> ---
>  drivers/iommu/dma-iommu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index c5ab8667e6f2..8b4b72654359 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -212,11 +212,13 @@ static struct iova *__alloc_iova(struct iommu_domain 
> *domain, size_t size,
>  
>   if (domain->geometry.force_aperture)
>   dma_limit = min(dma_limit, domain->geometry.aperture_end);
> +
> + dma_limit = min(dma_limit >> shift, (dma_addr_t)iovad->dma_32bit_pfn);
>   /*
>* Enforce size-alignment to be safe - there could perhaps be an
>* attribute to control this per-device, or at least per-domain...
>*/
> - return alloc_iova(iovad, length, dma_limit >> shift, true);
> + return alloc_iova(iovad, length, dma_limit, true);
>  }
>  
>  /* The IOVA allocator knows what we mapped, so just unmap whatever that was 
> */
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu