Re: [PATCH 2/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

2017-05-03 Thread Will Deacon
Hi Geetha,

On Tue, May 02, 2017 at 12:01:15PM +0530, Geetha Akula wrote:
> SMMU_IIDR register is broken on T99, that the reason we are using MIDR.

Urgh, that's unfortunate. In what way is it broken?

> If using MIDR is not accepted, can we enable errata based on SMMU resource 
> size?
> some thing like below.

No, you need to get your model number added to IORT after all if the IIDR
can't uniqely identify the part.

Sorry,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V8 07/11] iommu: of: Handle IOMMU lookup failure with deferred probing or error

2017-05-03 Thread Robin Murphy
Hi Geert,

On 02/05/17 19:35, Geert Uytterhoeven wrote:
> Hi Sricharan,
> 
> On Fri, Feb 3, 2017 at 4:48 PM, Sricharan R  wrote:
>> From: Laurent Pinchart 
>>
>> Failures to look up an IOMMU when parsing the DT iommus property need to
>> be handled separately from the .of_xlate() failures to support deferred
>> probing.
>>
>> The lack of a registered IOMMU can be caused by the lack of a driver for
>> the IOMMU, the IOMMU device probe not having been performed yet, having
>> been deferred, or having failed.
>>
>> The first case occurs when the device tree describes the bus master and
>> IOMMU topology correctly but no device driver exists for the IOMMU yet
>> or the device driver has not been compiled in. Return NULL, the caller
>> will configure the device without an IOMMU.
>>
>> The second and third cases are handled by deferring the probe of the bus
>> master device which will eventually get reprobed after the IOMMU.
>>
>> The last case is currently handled by deferring the probe of the bus
>> master device as well. A mechanism to either configure the bus master
>> device without an IOMMU or to fail the bus master device probe depending
>> on whether the IOMMU is optional or mandatory would be a good
>> enhancement.
>>
>> Tested-by: Marek Szyprowski 
>> Signed-off-by: Laurent Pichart 
>> Signed-off-by: Sricharan R 
> 
> This patch broke Renesas R-Car Gen3 platforms in renesas-drivers.
> As the IOMMU nodes in DT are not yet enabled, all devices having iommus
> properties in DT now fail to probe.

How exactly do they fail to probe? Per d7b0558230e4, if there are no ops
registered then they should merely defer until we reach the point of
giving up and ignoring the IOMMU. Is it just that you have no other
late-probing drivers or post-init module loads to kick the deferred
queue after that point? I did try to find a way to explicitly kick it
from a suitably late initcall, but there didn't seem to be any obvious
public interface - anyone have any suggestions?

I think that's more of a general problem with the probe deferral
mechanism itself (I've seen the same thing happen with some of the
CoreSight stuff on Juno due to the number of inter-component
dependencies) rather than any specific fault of this series.

Robin.

> This can be fixed by either:
>   - Disabling CONFIG_IPMMU_VMSA, or
>   - Reverting commit 7b07cbefb68d486f (but keeping "int ret = 0;").
> 
> Note that this was a bit hard to investigate, as R-Car Gen3 support wasn't
> upstreamed yet, so bisection pointed to a merge commit.
> 
>> ---
>>  [*] Fixed minor comment from Bjorn for removing the pci.h header inclusion
>>  in of_iommu.c
>>
>>  drivers/base/dma-mapping.c | 5 +++--
>>  drivers/iommu/of_iommu.c   | 4 ++--
>>  drivers/of/device.c| 7 ++-
>>  include/linux/of_device.h  | 9 ++---
>>  4 files changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
>> index 449b948..82bd45c 100644
>> --- a/drivers/base/dma-mapping.c
>> +++ b/drivers/base/dma-mapping.c
>> @@ -353,6 +353,7 @@ int dma_configure(struct device *dev)
>>  {
>> struct device *bridge = NULL, *dma_dev = dev;
>> enum dev_dma_attr attr;
>> +   int ret = 0;
>>
>> if (dev_is_pci(dev)) {
>> bridge = pci_get_host_bridge_device(to_pci_dev(dev));
>> @@ -363,7 +364,7 @@ int dma_configure(struct device *dev)
>> }
>>
>> if (dma_dev->of_node) {
>> -   of_dma_configure(dev, dma_dev->of_node);
>> +   ret = of_dma_configure(dev, dma_dev->of_node);
>> } else if (has_acpi_companion(dma_dev)) {
>> attr = 
>> acpi_get_dma_attr(to_acpi_device_node(dma_dev->fwnode));
>> if (attr != DEV_DMA_NOT_SUPPORTED)
>> @@ -373,7 +374,7 @@ int dma_configure(struct device *dev)
>> if (bridge)
>> pci_put_host_bridge_device(bridge);
>>
>> -   return 0;
>> +   return ret;
>>  }
>>
>>  void dma_deconfigure(struct device *dev)
>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>> index 1f92d98..2d04663 100644
>> --- a/drivers/iommu/of_iommu.c
>> +++ b/drivers/iommu/of_iommu.c
>> @@ -236,7 +236,7 @@ const struct iommu_ops *of_iommu_configure(struct device 
>> *dev,
>> ops = ERR_PTR(err);
>> }
>>
>> -   return IS_ERR(ops) ? NULL : ops;
>> +   return ops;
>>  }
>>
>>  static int __init of_iommu_init(void)
>> @@ -247,7 +247,7 @@ static int __init of_iommu_init(void)
>> for_each_matching_node_and_match(np, matches, &match) {
>> const of_iommu_init_fn init_fn = match->data;
>>
>> -   if (init_fn(np))
>> +   if (init_fn && init_fn(np))
>> pr_err("Failed to initialise IOMMU %s\n",
>> of_node_full_name(np));
>> }
>> diff --git a/drivers/of/device.c b/drivers/of/device.c
>> index c17c19d..ba51ca6 100644
>> --- a/drivers/of/device.c
>> +++

Re: [PATCH V8 07/11] iommu: of: Handle IOMMU lookup failure with deferred probing or error

2017-05-03 Thread Sricharan R
Hi Robin,

On 5/3/2017 3:24 PM, Robin Murphy wrote:
> Hi Geert,
> 
> On 02/05/17 19:35, Geert Uytterhoeven wrote:
>> Hi Sricharan,
>>
>> On Fri, Feb 3, 2017 at 4:48 PM, Sricharan R  wrote:
>>> From: Laurent Pinchart 
>>>
>>> Failures to look up an IOMMU when parsing the DT iommus property need to
>>> be handled separately from the .of_xlate() failures to support deferred
>>> probing.
>>>
>>> The lack of a registered IOMMU can be caused by the lack of a driver for
>>> the IOMMU, the IOMMU device probe not having been performed yet, having
>>> been deferred, or having failed.
>>>
>>> The first case occurs when the device tree describes the bus master and
>>> IOMMU topology correctly but no device driver exists for the IOMMU yet
>>> or the device driver has not been compiled in. Return NULL, the caller
>>> will configure the device without an IOMMU.
>>>
>>> The second and third cases are handled by deferring the probe of the bus
>>> master device which will eventually get reprobed after the IOMMU.
>>>
>>> The last case is currently handled by deferring the probe of the bus
>>> master device as well. A mechanism to either configure the bus master
>>> device without an IOMMU or to fail the bus master device probe depending
>>> on whether the IOMMU is optional or mandatory would be a good
>>> enhancement.
>>>
>>> Tested-by: Marek Szyprowski 
>>> Signed-off-by: Laurent Pichart 
>>> Signed-off-by: Sricharan R 
>>
>> This patch broke Renesas R-Car Gen3 platforms in renesas-drivers.
>> As the IOMMU nodes in DT are not yet enabled, all devices having iommus
>> properties in DT now fail to probe.
> 
> How exactly do they fail to probe? Per d7b0558230e4, if there are no ops
> registered then they should merely defer until we reach the point of
> giving up and ignoring the IOMMU. Is it just that you have no other
> late-probing drivers or post-init module loads to kick the deferred
> queue after that point? I did try to find a way to explicitly kick it
> from a suitably late initcall, but there didn't seem to be any obvious
> public interface - anyone have any suggestions?
> 
> I think that's more of a general problem with the probe deferral
> mechanism itself (I've seen the same thing happen with some of the
> CoreSight stuff on Juno due to the number of inter-component
> dependencies) rather than any specific fault of this series.
> 

I was thinking of an additional check like below to avoid the
situation ?

>From 499b6e662f60f23740b8880882b0a16f16434501 Mon Sep 17 00:00:00 2001
From: Sricharan R 
Date: Wed, 3 May 2017 13:16:59 +0530
Subject: [PATCH] iommu: of: Fix check for returning EPROBE_DEFER

While returning EPROBE_DEFER for iommu masters
take in to account of iommu nodes that could be
marked in DT as 'status=disabled', in which case
simply return NULL and let the master's probe
continue rather than deferring.

Signed-off-by: Sricharan R 
---
 drivers/iommu/of_iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 9f44ee8..e6e9bec 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -118,6 +118,7 @@ static bool of_iommu_driver_present(struct device_node *np)

ops = iommu_ops_from_fwnode(fwnode);
if ((ops && !ops->of_xlate) ||
+   !of_device_is_available(iommu_spec->np) ||
(!ops && !of_iommu_driver_present(iommu_spec->np)))
return NULL;

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation


Regards,
 Sricharan


> Robin.
> 
>> This can be fixed by either:
>>   - Disabling CONFIG_IPMMU_VMSA, or
>>   - Reverting commit 7b07cbefb68d486f (but keeping "int ret = 0;").
>>
>> Note that this was a bit hard to investigate, as R-Car Gen3 support wasn't
>> upstreamed yet, so bisection pointed to a merge commit.
>>
>>> ---
>>>  [*] Fixed minor comment from Bjorn for removing the pci.h header inclusion
>>>  in of_iommu.c
>>>
>>>  drivers/base/dma-mapping.c | 5 +++--
>>>  drivers/iommu/of_iommu.c   | 4 ++--
>>>  drivers/of/device.c| 7 ++-
>>>  include/linux/of_device.h  | 9 ++---
>>>  4 files changed, 17 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
>>> index 449b948..82bd45c 100644
>>> --- a/drivers/base/dma-mapping.c
>>> +++ b/drivers/base/dma-mapping.c
>>> @@ -353,6 +353,7 @@ int dma_configure(struct device *dev)
>>>  {
>>> struct device *bridge = NULL, *dma_dev = dev;
>>> enum dev_dma_attr attr;
>>> +   int ret = 0;
>>>
>>> if (dev_is_pci(dev)) {
>>> bridge = pci_get_host_bridge_device(to_pci_dev(dev));
>>> @@ -363,7 +364,7 @@ int dma_configure(struct device *dev)
>>> }
>>>
>>> if (dma_dev->of_node) {
>>> -   of_dma_configure(dev, dma_dev->of_node);
>>> +   ret = of_dma_configure(dev, dma_dev->of_node);
>>> } else if (has_acpi_companion(dma_de

Re: [PATCH] drivers/of_iommu: ignore SMMU DT nodes with status 'disabled'

2017-05-03 Thread Robin Murphy
On 28/04/17 14:22, Ard Biesheuvel wrote:
> On 28 April 2017 at 14:17, Will Deacon  wrote:
>> On Fri, Apr 28, 2017 at 02:14:49PM +0100, Ard Biesheuvel wrote:
>>> On 28 April 2017 at 14:11, Will Deacon  wrote:
 Hi Ard,

 [+ devicetree@]

 On Fri, Apr 14, 2017 at 01:43:15PM +0100, Ard Biesheuvel wrote:
> DT nodes may have a status property, and if they do, such nodes should
> only be considered present if the status property is set to 'okay'.
>
> Currently, we call the init function of IOMMUs described by the device
> tree without taking this into account, which may result in the output
> below on systems where some SMMUs may be legally disabled.
>
>  Failed to initialise IOMMU /smb/smmu@e020
>  Failed to initialise IOMMU /smb/smmu@e0c0
>  arm-smmu e0a0.smmu: probing hardware configuration...
>  arm-smmu e0a0.smmu: SMMUv1 with:
>  arm-smmu e0a0.smmu:  stage 2 translation
>  arm-smmu e0a0.smmu:  coherent table walk
>  arm-smmu e0a0.smmu:  stream matching with 32 register groups, mask 
> 0x7fff
>  arm-smmu e0a0.smmu:  8 context banks (8 stage-2 only)
>  arm-smmu e0a0.smmu:  Supported page sizes: 0x60211000
>  arm-smmu e0a0.smmu:  Stage-2: 40-bit IPA -> 40-bit PA
>  Failed to initialise IOMMU /smb/smmu@e060
>  Failed to initialise IOMMU /smb/smmu@e080
>
> Since this is not an error condition, only call the init function if
> the device is enabled, which also inhibits the spurious error messages.
>
> Signed-off-by: Ard Biesheuvel 
> ---
>  drivers/iommu/of_iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 2683e9fc0dcf..2dd1206e6c0d 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -183,7 +183,7 @@ static int __init of_iommu_init(void)
>   for_each_matching_node_and_match(np, matches, &match) {
>   const of_iommu_init_fn init_fn = match->data;
>
> - if (init_fn(np))
> + if (of_device_is_available(np) && init_fn(np))
>   pr_err("Failed to initialise IOMMU %s\n",
>   of_node_full_name(np));
>   }

 Is there a definition of what status = "disabled" is supposed to mean for 
 an
 IOMMU? For example, that could mean that the firmware has pre-programmed 
 the
 SMMU with particular translations or memory attributes (a bit like the
 CCA=1, CPM=1, DACS=0 case in ACPI IORT), or even disabled DMA traffic
 altogether.

 So I think we'd need an update to the generic IOMMU binding text to say
 exactly what the semantics are supposed to be here.

>>>
>>> I agree that it might make sense to describe the behavior of the IOMMU
>>> when it is left in the state we found it in. But that is not the same
>>> as status=disabled.
>>>
>>> The DTS subtree contains loads and loads of boilerplate
>>> configurations, where only some pieces are enabled in the final image
>>> by setting status=okay. So a node that has status 'disabled' should be
>>> treated as 'not present', not as 'present but can be ignored under
>>> assumptions such and such'
>>>
>>> In other words, I think we are talking about two different issues here.
>>
>> I'm not so sure... if we have a master device that has an iommus= property
>> pointing to an IOMMU with status="disabled", I really don't know whether we
>> should:
>>
>>   1. Assume the master can do DMA with a 1:1 mapping of memory and no
>>  changes to memory attributes
>>
>>   2. Assume the master can do DMA with a 1:1 mapping of memory, but
>>  potentially with changes to the attributes
>>
>>   3. Assume the master can do DMA, but with some pre-existing translation
>>  (what?)
>>
>>   4. Assume the master can't do DMA
>>
>> and I also don't know whether the "dma-coherent" property remains valid.
>>
> 
> Ah yes. Good point.
> 
> So indeed, there should be some IOMMU specific status property that
> can convey all of the above, or 1. and 4. at the minimum

FWIW, the underlying issue being addressed here should be going away now
anyway, since the now-queued probe deferral series obviates the init_fn
early-device-creation bodge. I've been deliberately ignoring it for some
time for precisely that reason ;)

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

2017-05-03 Thread Geetha Akula
Hi Will,

We will resubmit the patches based on IORT.


Thank you,
Geetha.

On Wed, May 3, 2017 at 3:17 PM, Will Deacon  wrote:
> Hi Geetha,
>
> On Tue, May 02, 2017 at 12:01:15PM +0530, Geetha Akula wrote:
>> SMMU_IIDR register is broken on T99, that the reason we are using MIDR.
>
> Urgh, that's unfortunate. In what way is it broken?
>
>> If using MIDR is not accepted, can we enable errata based on SMMU resource 
>> size?
>> some thing like below.
>
> No, you need to get your model number added to IORT after all if the IIDR
> can't uniqely identify the part.
>
> Sorry,
>
> Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] drivers/of_iommu: ignore SMMU DT nodes with status 'disabled'

2017-05-03 Thread Ard Biesheuvel

> On 3 May 2017, at 11:32, Robin Murphy  wrote:
> 
>> On 28/04/17 14:22, Ard Biesheuvel wrote:
>>> On 28 April 2017 at 14:17, Will Deacon  wrote:
 On Fri, Apr 28, 2017 at 02:14:49PM +0100, Ard Biesheuvel wrote:
> On 28 April 2017 at 14:11, Will Deacon  wrote:
> Hi Ard,
> 
> [+ devicetree@]
> 
>> On Fri, Apr 14, 2017 at 01:43:15PM +0100, Ard Biesheuvel wrote:
>> DT nodes may have a status property, and if they do, such nodes should
>> only be considered present if the status property is set to 'okay'.
>> 
>> Currently, we call the init function of IOMMUs described by the device
>> tree without taking this into account, which may result in the output
>> below on systems where some SMMUs may be legally disabled.
>> 
>> Failed to initialise IOMMU /smb/smmu@e020
>> Failed to initialise IOMMU /smb/smmu@e0c0
>> arm-smmu e0a0.smmu: probing hardware configuration...
>> arm-smmu e0a0.smmu: SMMUv1 with:
>> arm-smmu e0a0.smmu:  stage 2 translation
>> arm-smmu e0a0.smmu:  coherent table walk
>> arm-smmu e0a0.smmu:  stream matching with 32 register groups, mask 
>> 0x7fff
>> arm-smmu e0a0.smmu:  8 context banks (8 stage-2 only)
>> arm-smmu e0a0.smmu:  Supported page sizes: 0x60211000
>> arm-smmu e0a0.smmu:  Stage-2: 40-bit IPA -> 40-bit PA
>> Failed to initialise IOMMU /smb/smmu@e060
>> Failed to initialise IOMMU /smb/smmu@e080
>> 
>> Since this is not an error condition, only call the init function if
>> the device is enabled, which also inhibits the spurious error messages.
>> 
>> Signed-off-by: Ard Biesheuvel 
>> ---
>> drivers/iommu/of_iommu.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>> index 2683e9fc0dcf..2dd1206e6c0d 100644
>> --- a/drivers/iommu/of_iommu.c
>> +++ b/drivers/iommu/of_iommu.c
>> @@ -183,7 +183,7 @@ static int __init of_iommu_init(void)
>>  for_each_matching_node_and_match(np, matches, &match) {
>>  const of_iommu_init_fn init_fn = match->data;
>> 
>> - if (init_fn(np))
>> + if (of_device_is_available(np) && init_fn(np))
>>  pr_err("Failed to initialise IOMMU %s\n",
>>  of_node_full_name(np));
>>  }
> 
> Is there a definition of what status = "disabled" is supposed to mean for 
> an
> IOMMU? For example, that could mean that the firmware has pre-programmed 
> the
> SMMU with particular translations or memory attributes (a bit like the
> CCA=1, CPM=1, DACS=0 case in ACPI IORT), or even disabled DMA traffic
> altogether.
> 
> So I think we'd need an update to the generic IOMMU binding text to say
> exactly what the semantics are supposed to be here.
> 
 
 I agree that it might make sense to describe the behavior of the IOMMU
 when it is left in the state we found it in. But that is not the same
 as status=disabled.
 
 The DTS subtree contains loads and loads of boilerplate
 configurations, where only some pieces are enabled in the final image
 by setting status=okay. So a node that has status 'disabled' should be
 treated as 'not present', not as 'present but can be ignored under
 assumptions such and such'
 
 In other words, I think we are talking about two different issues here.
>>> 
>>> I'm not so sure... if we have a master device that has an iommus= property
>>> pointing to an IOMMU with status="disabled", I really don't know whether we
>>> should:
>>> 
>>>  1. Assume the master can do DMA with a 1:1 mapping of memory and no
>>> changes to memory attributes
>>> 
>>>  2. Assume the master can do DMA with a 1:1 mapping of memory, but
>>> potentially with changes to the attributes
>>> 
>>>  3. Assume the master can do DMA, but with some pre-existing translation
>>> (what?)
>>> 
>>>  4. Assume the master can't do DMA
>>> 
>>> and I also don't know whether the "dma-coherent" property remains valid.
>>> 
>> 
>> Ah yes. Good point.
>> 
>> So indeed, there should be some IOMMU specific status property that
>> can convey all of the above, or 1. and 4. at the minimum
> 
> FWIW, the underlying issue being addressed here should be going away now
> anyway, since the now-queued probe deferral series obviates the init_fn
> early-device-creation bodge. I've been deliberately ignoring it for some
> time for precisely that reason ;)
> 


Ok. I have also updated the Seattle firmware to remove the smmu nodes and the 
associated iommus/iommu-map properties entirely when disabling SMMU support in 
the firmware, which should address Will's concern regarding unspecified 
behavior of a disabled SMMU.

IOW, this patch can be disregarded. Thanks.

___
iommu m

Re: [PATCH V8 07/11] iommu: of: Handle IOMMU lookup failure with deferred probing or error

2017-05-03 Thread Sricharan R
Hi,

On 5/3/2017 3:54 PM, Sricharan R wrote:
> Hi Robin,
> 
> On 5/3/2017 3:24 PM, Robin Murphy wrote:
>> Hi Geert,
>>
>> On 02/05/17 19:35, Geert Uytterhoeven wrote:
>>> Hi Sricharan,
>>>
>>> On Fri, Feb 3, 2017 at 4:48 PM, Sricharan R  
>>> wrote:
 From: Laurent Pinchart 

 Failures to look up an IOMMU when parsing the DT iommus property need to
 be handled separately from the .of_xlate() failures to support deferred
 probing.

 The lack of a registered IOMMU can be caused by the lack of a driver for
 the IOMMU, the IOMMU device probe not having been performed yet, having
 been deferred, or having failed.

 The first case occurs when the device tree describes the bus master and
 IOMMU topology correctly but no device driver exists for the IOMMU yet
 or the device driver has not been compiled in. Return NULL, the caller
 will configure the device without an IOMMU.

 The second and third cases are handled by deferring the probe of the bus
 master device which will eventually get reprobed after the IOMMU.

 The last case is currently handled by deferring the probe of the bus
 master device as well. A mechanism to either configure the bus master
 device without an IOMMU or to fail the bus master device probe depending
 on whether the IOMMU is optional or mandatory would be a good
 enhancement.

 Tested-by: Marek Szyprowski 
 Signed-off-by: Laurent Pichart 
 Signed-off-by: Sricharan R 
>>>
>>> This patch broke Renesas R-Car Gen3 platforms in renesas-drivers.
>>> As the IOMMU nodes in DT are not yet enabled, all devices having iommus
>>> properties in DT now fail to probe.
>>
>> How exactly do they fail to probe? Per d7b0558230e4, if there are no ops
>> registered then they should merely defer until we reach the point of
>> giving up and ignoring the IOMMU. Is it just that you have no other
>> late-probing drivers or post-init module loads to kick the deferred
>> queue after that point? I did try to find a way to explicitly kick it
>> from a suitably late initcall, but there didn't seem to be any obvious
>> public interface - anyone have any suggestions?
>>
>> I think that's more of a general problem with the probe deferral
>> mechanism itself (I've seen the same thing happen with some of the
>> CoreSight stuff on Juno due to the number of inter-component
>> dependencies) rather than any specific fault of this series.
>>
> 
> I was thinking of an additional check like below to avoid the
> situation ?
> 
> From 499b6e662f60f23740b8880882b0a16f16434501 Mon Sep 17 00:00:00 2001
> From: Sricharan R 
> Date: Wed, 3 May 2017 13:16:59 +0530
> Subject: [PATCH] iommu: of: Fix check for returning EPROBE_DEFER
> 
> While returning EPROBE_DEFER for iommu masters
> take in to account of iommu nodes that could be
> marked in DT as 'status=disabled', in which case
> simply return NULL and let the master's probe
> continue rather than deferring.
> 
> Signed-off-by: Sricharan R 
> ---
>  drivers/iommu/of_iommu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 9f44ee8..e6e9bec 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -118,6 +118,7 @@ static bool of_iommu_driver_present(struct device_node 
> *np)
> 
> ops = iommu_ops_from_fwnode(fwnode);
> if ((ops && !ops->of_xlate) ||
> +   !of_device_is_available(iommu_spec->np) ||
> (!ops && !of_iommu_driver_present(iommu_spec->np)))
> return NULL;
> 

While same as the other 'status=disabled' patch [1], better not to
defer the probe itself in the case ?

[1] https://patchwork.kernel.org/patch/9681211/

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Poll for CMDQ drain completion more effectively

2017-05-03 Thread Sunil Kovvuri
On Thu, Apr 27, 2017 at 4:43 PM,   wrote:
> From: Sunil Goutham 
>
> Modified polling on CMDQ consumer similar to how polling is done for TLB SYNC
> completion in SMMUv2 driver. Code changes are done with reference to
>
> 8513c8930069 iommu/arm-smmu: Poll for TLB sync completion more effectively
>
> Poll timeout has been increased which addresses issue of 100us timeout not
> sufficient, when command queue is full with TLB invalidation commands.
>
> Signed-off-by: Sunil Goutham 
> Signed-off-by: Geetha 
> ---
>  drivers/iommu/arm-smmu-v3.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index d412bdd..34599d4 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -379,6 +379,9 @@
>  #define CMDQ_SYNC_0_CS_NONE(0UL << CMDQ_SYNC_0_CS_SHIFT)
>  #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT)
>
> +#define CMDQ_DRAIN_TIMEOUT_US  1000
> +#define CMDQ_SPIN_COUNT10
> +
>  /* Event queue */
>  #define EVTQ_ENT_DWORDS4
>  #define EVTQ_MAX_SZ_SHIFT  7
> @@ -737,7 +740,8 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
>   */
>  static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe)
>  {
> -   ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US);
> +   ktime_t timeout = ktime_add_us(ktime_get(), CMDQ_DRAIN_TIMEOUT_US);
> +   unsigned int spin_cnt, delay = 1;
>
> while (queue_sync_cons(q), (drain ? !queue_empty(q) : queue_full(q))) 
> {
> if (ktime_compare(ktime_get(), timeout) > 0)
> @@ -746,8 +750,13 @@ static int queue_poll_cons(struct arm_smmu_queue *q, 
> bool drain, bool wfe)
> if (wfe) {
> wfe();
> } else {
> -   cpu_relax();
> -   udelay(1);
> +   for (spin_cnt = 0;
> +spin_cnt < CMDQ_SPIN_COUNT; spin_cnt++) {
> +   cpu_relax();
> +   continue;
> +   }
> +   udelay(delay);
> +   delay *= 2;
> }
> }
>
> --
> 2.7.4
>

Sorry for the ignorance.
Is there a patchwork where I can check current status of ARM IOMMU
related patches ?

And is this patch accepted, if not any comments / feedback ?

Thanks,
Sunil.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


AMD Ryzen KVM/NPT/IOMMU issue

2017-05-03 Thread Matthias Ehrenfeuchter

Hi,

There are a lot of messages/threads out there about bad performance 
while using AMDs Ryzen with KVM GPU passthrough. It revolves all on 
enabling/disabling npt, while enabled overall VM performance is nice but 
the GPU performance gives me about 20% (and a lot of drops to zero GPU 
usage, while CPU/Disk/Ram also doing nothing) compared to npt disabled. 
But while npt is disabled overall VM performance is like beeing on 4x86 
with floppy disk as only storage. (Ex. it takes 2 seconds just to open 
startmenu while host and vm are in idle, and neither CPU pinning, 
changing CPU model, changing storage device nor using hugepages changed 
anything).


So everything I read pointed to a bug in the npt implementation? 
Anything I could do to get closer to the "thing" issuing this?


Best Regards

efeu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/rockchip: Enable Rockchip IOMMU on ARM64

2017-05-03 Thread Matthias Brugger
From: Simon Xue 

This patch makes it possible to compile the rockchip-iommu driver on
ARM64, so that it can be used with 64-bit SoCs equipped with this type
of IOMMU.

Signed-off-by: Simon Xue 
Signed-off-by: Shunqian Zheng 
Signed-off-by: Tomasz Figa 
Reviewed-by: Matthias Brugger 
---
 drivers/iommu/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6ee3a25ae731..99c6366a2551 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -219,7 +219,7 @@ config OMAP_IOMMU_DEBUG
 
 config ROCKCHIP_IOMMU
bool "Rockchip IOMMU Support"
-   depends on ARM
+   depends on ARM || ARM64
depends on ARCH_ROCKCHIP || COMPILE_TEST
select IOMMU_API
select ARM_DMA_USE_IOMMU
-- 
2.12.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Poll for CMDQ drain completion more effectively

2017-05-03 Thread Robin Murphy
On 27/04/17 12:13, sunil.kovv...@gmail.com wrote:
> From: Sunil Goutham 
> 
> Modified polling on CMDQ consumer similar to how polling is done for TLB SYNC
> completion in SMMUv2 driver. Code changes are done with reference to
> 
> 8513c8930069 iommu/arm-smmu: Poll for TLB sync completion more effectively
> 
> Poll timeout has been increased which addresses issue of 100us timeout not
> sufficient, when command queue is full with TLB invalidation commands.
> 
> Signed-off-by: Sunil Goutham 
> Signed-off-by: Geetha 
> ---
>  drivers/iommu/arm-smmu-v3.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index d412bdd..34599d4 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -379,6 +379,9 @@
>  #define CMDQ_SYNC_0_CS_NONE  (0UL << CMDQ_SYNC_0_CS_SHIFT)
>  #define CMDQ_SYNC_0_CS_SEV   (2UL << CMDQ_SYNC_0_CS_SHIFT)
>  
> +#define CMDQ_DRAIN_TIMEOUT_US1000
> +#define CMDQ_SPIN_COUNT  10
> +
>  /* Event queue */
>  #define EVTQ_ENT_DWORDS  4
>  #define EVTQ_MAX_SZ_SHIFT7
> @@ -737,7 +740,8 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
>   */
>  static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe)
>  {
> - ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US);
> + ktime_t timeout = ktime_add_us(ktime_get(), CMDQ_DRAIN_TIMEOUT_US);
> + unsigned int spin_cnt, delay = 1;
>  
>   while (queue_sync_cons(q), (drain ? !queue_empty(q) : queue_full(q))) {
>   if (ktime_compare(ktime_get(), timeout) > 0)
> @@ -746,8 +750,13 @@ static int queue_poll_cons(struct arm_smmu_queue *q, 
> bool drain, bool wfe)
>   if (wfe) {
>   wfe();
>   } else {
> - cpu_relax();
> - udelay(1);
> + for (spin_cnt = 0;
> +  spin_cnt < CMDQ_SPIN_COUNT; spin_cnt++) {
> + cpu_relax();
> + continue;
> + }
> + udelay(delay);
> + delay *= 2;

Sorry, I can't make sense of this. The referenced commit uses the spin
loop to poll opportunistically a few times before delaying. This loop
just adds a short open-coded udelay to an exponential udelay, and it's
not really clear that that's any better than a fixed udelay (especially
as the two cases in which we poll are somewhat different).

What's wrong with simply increasing the timeout value alone?

Robin.

>   }
>   }
>  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Poll for CMDQ drain completion more effectively

2017-05-03 Thread Will Deacon
On Wed, May 03, 2017 at 06:49:09PM +0530, Sunil Kovvuri wrote:
> On Thu, Apr 27, 2017 at 4:43 PM,   wrote:
> > From: Sunil Goutham 
> >
> > Modified polling on CMDQ consumer similar to how polling is done for TLB 
> > SYNC
> > completion in SMMUv2 driver. Code changes are done with reference to
> >
> > 8513c8930069 iommu/arm-smmu: Poll for TLB sync completion more effectively
> >
> > Poll timeout has been increased which addresses issue of 100us timeout not
> > sufficient, when command queue is full with TLB invalidation commands.
> >
> > Signed-off-by: Sunil Goutham 
> > Signed-off-by: Geetha 
> > ---
> >  drivers/iommu/arm-smmu-v3.c | 15 ---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index d412bdd..34599d4 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -379,6 +379,9 @@
> >  #define CMDQ_SYNC_0_CS_NONE(0UL << CMDQ_SYNC_0_CS_SHIFT)
> >  #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT)
> >
> > +#define CMDQ_DRAIN_TIMEOUT_US  1000
> > +#define CMDQ_SPIN_COUNT10
> > +
> >  /* Event queue */
> >  #define EVTQ_ENT_DWORDS4
> >  #define EVTQ_MAX_SZ_SHIFT  7
> > @@ -737,7 +740,8 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
> >   */
> >  static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe)
> >  {
> > -   ktime_t timeout = ktime_add_us(ktime_get(), 
> > ARM_SMMU_POLL_TIMEOUT_US);
> > +   ktime_t timeout = ktime_add_us(ktime_get(), CMDQ_DRAIN_TIMEOUT_US);
> > +   unsigned int spin_cnt, delay = 1;
> >
> > while (queue_sync_cons(q), (drain ? !queue_empty(q) : 
> > queue_full(q))) {
> > if (ktime_compare(ktime_get(), timeout) > 0)
> > @@ -746,8 +750,13 @@ static int queue_poll_cons(struct arm_smmu_queue *q, 
> > bool drain, bool wfe)
> > if (wfe) {
> > wfe();
> > } else {
> > -   cpu_relax();
> > -   udelay(1);
> > +   for (spin_cnt = 0;
> > +spin_cnt < CMDQ_SPIN_COUNT; spin_cnt++) {
> > +   cpu_relax();
> > +   continue;
> > +   }
> > +   udelay(delay);
> > +   delay *= 2;
> > }
> > }
> >
> > --
> > 2.7.4
> >
> 
> Sorry for the ignorance.
> Is there a patchwork where I can check current status of ARM IOMMU
> related patches ?
> 
> And is this patch accepted, if not any comments / feedback ?

Please be patient: the merge window is open and it's not been long since you
posted the patch, which looks pretty bonkers at first glance.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Poll for CMDQ drain completion more effectively

2017-05-03 Thread Will Deacon
On Wed, May 03, 2017 at 04:33:57PM +0100, Robin Murphy wrote:
> On 27/04/17 12:13, sunil.kovv...@gmail.com wrote:
> > From: Sunil Goutham 
> > 
> > Modified polling on CMDQ consumer similar to how polling is done for TLB 
> > SYNC
> > completion in SMMUv2 driver. Code changes are done with reference to
> > 
> > 8513c8930069 iommu/arm-smmu: Poll for TLB sync completion more effectively
> > 
> > Poll timeout has been increased which addresses issue of 100us timeout not
> > sufficient, when command queue is full with TLB invalidation commands.
> > 
> > Signed-off-by: Sunil Goutham 
> > Signed-off-by: Geetha 
> > ---
> >  drivers/iommu/arm-smmu-v3.c | 15 ---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index d412bdd..34599d4 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -379,6 +379,9 @@
> >  #define CMDQ_SYNC_0_CS_NONE(0UL << CMDQ_SYNC_0_CS_SHIFT)
> >  #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT)
> >  
> > +#define CMDQ_DRAIN_TIMEOUT_US  1000
> > +#define CMDQ_SPIN_COUNT10
> > +
> >  /* Event queue */
> >  #define EVTQ_ENT_DWORDS4
> >  #define EVTQ_MAX_SZ_SHIFT  7
> > @@ -737,7 +740,8 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
> >   */
> >  static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe)
> >  {
> > -   ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US);
> > +   ktime_t timeout = ktime_add_us(ktime_get(), CMDQ_DRAIN_TIMEOUT_US);
> > +   unsigned int spin_cnt, delay = 1;
> >  
> > while (queue_sync_cons(q), (drain ? !queue_empty(q) : queue_full(q))) {
> > if (ktime_compare(ktime_get(), timeout) > 0)
> > @@ -746,8 +750,13 @@ static int queue_poll_cons(struct arm_smmu_queue *q, 
> > bool drain, bool wfe)
> > if (wfe) {
> > wfe();
> > } else {
> > -   cpu_relax();
> > -   udelay(1);
> > +   for (spin_cnt = 0;
> > +spin_cnt < CMDQ_SPIN_COUNT; spin_cnt++) {
> > +   cpu_relax();
> > +   continue;
> > +   }
> > +   udelay(delay);
> > +   delay *= 2;
> 
> Sorry, I can't make sense of this. The referenced commit uses the spin
> loop to poll opportunistically a few times before delaying. This loop
> just adds a short open-coded udelay to an exponential udelay, and it's
> not really clear that that's any better than a fixed udelay (especially
> as the two cases in which we poll are somewhat different).
> 
> What's wrong with simply increasing the timeout value alone?

I asked that the timeout is only increased for the drain case, and that
we fix the issue here where we udelat if cons didn't move immediately:

http://lists.infradead.org/pipermail/linux-arm-kernel/2017-April/503389.html

but I don't think the patch above actually achieves any of that.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Poll for CMDQ drain completion more effectively

2017-05-03 Thread Sunil Kovvuri
On Wed, May 3, 2017 at 9:07 PM, Will Deacon  wrote:
> On Wed, May 03, 2017 at 06:49:09PM +0530, Sunil Kovvuri wrote:
>> On Thu, Apr 27, 2017 at 4:43 PM,   wrote:
>> > From: Sunil Goutham 
>> >
>> > Modified polling on CMDQ consumer similar to how polling is done for TLB 
>> > SYNC
>> > completion in SMMUv2 driver. Code changes are done with reference to
>> >
>> > 8513c8930069 iommu/arm-smmu: Poll for TLB sync completion more effectively
>> >
>> > Poll timeout has been increased which addresses issue of 100us timeout not
>> > sufficient, when command queue is full with TLB invalidation commands.
>> >
>> > Signed-off-by: Sunil Goutham 
>> > Signed-off-by: Geetha 
>> > ---
>> >  drivers/iommu/arm-smmu-v3.c | 15 ---
>> >  1 file changed, 12 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> > index d412bdd..34599d4 100644
>> > --- a/drivers/iommu/arm-smmu-v3.c
>> > +++ b/drivers/iommu/arm-smmu-v3.c
>> > @@ -379,6 +379,9 @@
>> >  #define CMDQ_SYNC_0_CS_NONE(0UL << CMDQ_SYNC_0_CS_SHIFT)
>> >  #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT)
>> >
>> > +#define CMDQ_DRAIN_TIMEOUT_US  1000
>> > +#define CMDQ_SPIN_COUNT10
>> > +
>> >  /* Event queue */
>> >  #define EVTQ_ENT_DWORDS4
>> >  #define EVTQ_MAX_SZ_SHIFT  7
>> > @@ -737,7 +740,8 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
>> >   */
>> >  static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe)
>> >  {
>> > -   ktime_t timeout = ktime_add_us(ktime_get(), 
>> > ARM_SMMU_POLL_TIMEOUT_US);
>> > +   ktime_t timeout = ktime_add_us(ktime_get(), CMDQ_DRAIN_TIMEOUT_US);
>> > +   unsigned int spin_cnt, delay = 1;
>> >
>> > while (queue_sync_cons(q), (drain ? !queue_empty(q) : 
>> > queue_full(q))) {
>> > if (ktime_compare(ktime_get(), timeout) > 0)
>> > @@ -746,8 +750,13 @@ static int queue_poll_cons(struct arm_smmu_queue *q, 
>> > bool drain, bool wfe)
>> > if (wfe) {
>> > wfe();
>> > } else {
>> > -   cpu_relax();
>> > -   udelay(1);
>> > +   for (spin_cnt = 0;
>> > +spin_cnt < CMDQ_SPIN_COUNT; spin_cnt++) {
>> > +   cpu_relax();
>> > +   continue;
>> > +   }
>> > +   udelay(delay);
>> > +   delay *= 2;
>> > }
>> > }
>> >
>> > --
>> > 2.7.4
>> >
>>
>> Sorry for the ignorance.
>> Is there a patchwork where I can check current status of ARM IOMMU
>> related patches ?
>>
>> And is this patch accepted, if not any comments / feedback ?
>
> Please be patient: the merge window is open and it's not been long since you
> posted the patch, which looks pretty bonkers at first glance.
>
> Will

Look at this
https://lkml.org/lkml/2017/4/3/605
The same thing, i pinged after a week and you said you already picked it up.
All I am asking is how do i know the current status, how many days
would normally
be considered being patient ?
Instead of troubling you, is there a patchwork where i can check the status ?

Thanks,
Sunil.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Poll for CMDQ drain completion more effectively

2017-05-03 Thread Will Deacon
On Wed, May 03, 2017 at 09:24:13PM +0530, Sunil Kovvuri wrote:
> On Wed, May 3, 2017 at 9:07 PM, Will Deacon  wrote:
> > On Wed, May 03, 2017 at 06:49:09PM +0530, Sunil Kovvuri wrote:
> >> On Thu, Apr 27, 2017 at 4:43 PM,   wrote:
> >> > From: Sunil Goutham 
> >> >
> >> > Modified polling on CMDQ consumer similar to how polling is done for TLB 
> >> > SYNC
> >> > completion in SMMUv2 driver. Code changes are done with reference to
> >> >
> >> > 8513c8930069 iommu/arm-smmu: Poll for TLB sync completion more 
> >> > effectively
> >> >
> >> > Poll timeout has been increased which addresses issue of 100us timeout 
> >> > not
> >> > sufficient, when command queue is full with TLB invalidation commands.
> >> >
> >> > Signed-off-by: Sunil Goutham 
> >> > Signed-off-by: Geetha 
> >> > ---
> >> >  drivers/iommu/arm-smmu-v3.c | 15 ---
> >> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> >> > index d412bdd..34599d4 100644
> >> > --- a/drivers/iommu/arm-smmu-v3.c
> >> > +++ b/drivers/iommu/arm-smmu-v3.c
> >> > @@ -379,6 +379,9 @@
> >> >  #define CMDQ_SYNC_0_CS_NONE(0UL << CMDQ_SYNC_0_CS_SHIFT)
> >> >  #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT)
> >> >
> >> > +#define CMDQ_DRAIN_TIMEOUT_US  1000
> >> > +#define CMDQ_SPIN_COUNT10
> >> > +
> >> >  /* Event queue */
> >> >  #define EVTQ_ENT_DWORDS4
> >> >  #define EVTQ_MAX_SZ_SHIFT  7
> >> > @@ -737,7 +740,8 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
> >> >   */
> >> >  static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool 
> >> > wfe)
> >> >  {
> >> > -   ktime_t timeout = ktime_add_us(ktime_get(), 
> >> > ARM_SMMU_POLL_TIMEOUT_US);
> >> > +   ktime_t timeout = ktime_add_us(ktime_get(), 
> >> > CMDQ_DRAIN_TIMEOUT_US);
> >> > +   unsigned int spin_cnt, delay = 1;
> >> >
> >> > while (queue_sync_cons(q), (drain ? !queue_empty(q) : 
> >> > queue_full(q))) {
> >> > if (ktime_compare(ktime_get(), timeout) > 0)
> >> > @@ -746,8 +750,13 @@ static int queue_poll_cons(struct arm_smmu_queue 
> >> > *q, bool drain, bool wfe)
> >> > if (wfe) {
> >> > wfe();
> >> > } else {
> >> > -   cpu_relax();
> >> > -   udelay(1);
> >> > +   for (spin_cnt = 0;
> >> > +spin_cnt < CMDQ_SPIN_COUNT; spin_cnt++) {
> >> > +   cpu_relax();
> >> > +   continue;
> >> > +   }
> >> > +   udelay(delay);
> >> > +   delay *= 2;
> >> > }
> >> > }
> >> >
> >> > --
> >> > 2.7.4
> >> >
> >>
> >> Sorry for the ignorance.
> >> Is there a patchwork where I can check current status of ARM IOMMU
> >> related patches ?
> >>
> >> And is this patch accepted, if not any comments / feedback ?
> >
> > Please be patient: the merge window is open and it's not been long since you
> > posted the patch, which looks pretty bonkers at first glance.
> >
> > Will
> 
> Look at this
> https://lkml.org/lkml/2017/4/3/605
> The same thing, i pinged after a week and you said you already picked it up.
> All I am asking is how do i know the current status, how many days
> would normally
> be considered being patient ?

At least wait until the merge window is over if it's not a fix, or keep an
eye on the relevant branches (see below).

> Instead of troubling you, is there a patchwork where i can check the status ?

No, but I pick patches up on my iommu/devel branch here:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/

and at some point they appear on for-joerg/arm-smmu/updates, which I send
to Joerg (who is the iommu maintainer). He then puts them into linux-next
before they get sent for inclusion in mainline during the next merge window.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Poll for CMDQ drain completion more effectively

2017-05-03 Thread Sunil Kovvuri
On Wed, May 3, 2017 at 9:10 PM, Will Deacon  wrote:
> On Wed, May 03, 2017 at 04:33:57PM +0100, Robin Murphy wrote:
>> On 27/04/17 12:13, sunil.kovv...@gmail.com wrote:
>> > From: Sunil Goutham 
>> >
>> > Modified polling on CMDQ consumer similar to how polling is done for TLB 
>> > SYNC
>> > completion in SMMUv2 driver. Code changes are done with reference to
>> >
>> > 8513c8930069 iommu/arm-smmu: Poll for TLB sync completion more effectively
>> >
>> > Poll timeout has been increased which addresses issue of 100us timeout not
>> > sufficient, when command queue is full with TLB invalidation commands.
>> >
>> > Signed-off-by: Sunil Goutham 
>> > Signed-off-by: Geetha 
>> > ---
>> >  drivers/iommu/arm-smmu-v3.c | 15 ---
>> >  1 file changed, 12 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> > index d412bdd..34599d4 100644
>> > --- a/drivers/iommu/arm-smmu-v3.c
>> > +++ b/drivers/iommu/arm-smmu-v3.c
>> > @@ -379,6 +379,9 @@
>> >  #define CMDQ_SYNC_0_CS_NONE(0UL << CMDQ_SYNC_0_CS_SHIFT)
>> >  #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT)
>> >
>> > +#define CMDQ_DRAIN_TIMEOUT_US  1000
>> > +#define CMDQ_SPIN_COUNT10
>> > +
>> >  /* Event queue */
>> >  #define EVTQ_ENT_DWORDS4
>> >  #define EVTQ_MAX_SZ_SHIFT  7
>> > @@ -737,7 +740,8 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
>> >   */
>> >  static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe)
>> >  {
>> > -   ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US);
>> > +   ktime_t timeout = ktime_add_us(ktime_get(), CMDQ_DRAIN_TIMEOUT_US);
>> > +   unsigned int spin_cnt, delay = 1;
>> >
>> > while (queue_sync_cons(q), (drain ? !queue_empty(q) : queue_full(q))) {
>> > if (ktime_compare(ktime_get(), timeout) > 0)
>> > @@ -746,8 +750,13 @@ static int queue_poll_cons(struct arm_smmu_queue *q, 
>> > bool drain, bool wfe)
>> > if (wfe) {
>> > wfe();
>> > } else {
>> > -   cpu_relax();
>> > -   udelay(1);
>> > +   for (spin_cnt = 0;
>> > +spin_cnt < CMDQ_SPIN_COUNT; spin_cnt++) {
>> > +   cpu_relax();
>> > +   continue;
>> > +   }
>> > +   udelay(delay);
>> > +   delay *= 2;
>>
>> Sorry, I can't make sense of this. The referenced commit uses the spin
>> loop to poll opportunistically a few times before delaying. This loop
>> just adds a short open-coded udelay to an exponential udelay, and it's
>> not really clear that that's any better than a fixed udelay (especially
>> as the two cases in which we poll are somewhat different).
>>
>> What's wrong with simply increasing the timeout value alone?
>
> I asked that the timeout is only increased for the drain case, and that
> we fix the issue here where we udelat if cons didn't move immediately:
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2017-April/503389.html
>
> but I don't think the patch above actually achieves any of that.
>
> Will

Sorry, I completely screwed up the spin poll above.
Will resubmit.

Thanks,
Sunil.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Poll for CMDQ drain completion more effectively

2017-05-03 Thread Sunil Kovvuri
On Wed, May 3, 2017 at 9:29 PM, Will Deacon  wrote:
> On Wed, May 03, 2017 at 09:24:13PM +0530, Sunil Kovvuri wrote:
>> On Wed, May 3, 2017 at 9:07 PM, Will Deacon  wrote:
>> > On Wed, May 03, 2017 at 06:49:09PM +0530, Sunil Kovvuri wrote:
>> >> On Thu, Apr 27, 2017 at 4:43 PM,   wrote:
>> >> > From: Sunil Goutham 
>> >> >
>> >> > Modified polling on CMDQ consumer similar to how polling is done for 
>> >> > TLB SYNC
>> >> > completion in SMMUv2 driver. Code changes are done with reference to
>> >> >
>> >> > 8513c8930069 iommu/arm-smmu: Poll for TLB sync completion more 
>> >> > effectively
>> >> >
>> >> > Poll timeout has been increased which addresses issue of 100us timeout 
>> >> > not
>> >> > sufficient, when command queue is full with TLB invalidation commands.
>> >> >
>> >> > Signed-off-by: Sunil Goutham 
>> >> > Signed-off-by: Geetha 
>> >> > ---
>> >> >  drivers/iommu/arm-smmu-v3.c | 15 ---
>> >> >  1 file changed, 12 insertions(+), 3 deletions(-)
>> >> >
>> >> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> >> > index d412bdd..34599d4 100644
>> >> > --- a/drivers/iommu/arm-smmu-v3.c
>> >> > +++ b/drivers/iommu/arm-smmu-v3.c
>> >> > @@ -379,6 +379,9 @@
>> >> >  #define CMDQ_SYNC_0_CS_NONE(0UL << CMDQ_SYNC_0_CS_SHIFT)
>> >> >  #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT)
>> >> >
>> >> > +#define CMDQ_DRAIN_TIMEOUT_US  1000
>> >> > +#define CMDQ_SPIN_COUNT10
>> >> > +
>> >> >  /* Event queue */
>> >> >  #define EVTQ_ENT_DWORDS4
>> >> >  #define EVTQ_MAX_SZ_SHIFT  7
>> >> > @@ -737,7 +740,8 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
>> >> >   */
>> >> >  static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool 
>> >> > wfe)
>> >> >  {
>> >> > -   ktime_t timeout = ktime_add_us(ktime_get(), 
>> >> > ARM_SMMU_POLL_TIMEOUT_US);
>> >> > +   ktime_t timeout = ktime_add_us(ktime_get(), 
>> >> > CMDQ_DRAIN_TIMEOUT_US);
>> >> > +   unsigned int spin_cnt, delay = 1;
>> >> >
>> >> > while (queue_sync_cons(q), (drain ? !queue_empty(q) : 
>> >> > queue_full(q))) {
>> >> > if (ktime_compare(ktime_get(), timeout) > 0)
>> >> > @@ -746,8 +750,13 @@ static int queue_poll_cons(struct arm_smmu_queue 
>> >> > *q, bool drain, bool wfe)
>> >> > if (wfe) {
>> >> > wfe();
>> >> > } else {
>> >> > -   cpu_relax();
>> >> > -   udelay(1);
>> >> > +   for (spin_cnt = 0;
>> >> > +spin_cnt < CMDQ_SPIN_COUNT; spin_cnt++) {
>> >> > +   cpu_relax();
>> >> > +   continue;
>> >> > +   }
>> >> > +   udelay(delay);
>> >> > +   delay *= 2;
>> >> > }
>> >> > }
>> >> >
>> >> > --
>> >> > 2.7.4
>> >> >
>> >>
>> >> Sorry for the ignorance.
>> >> Is there a patchwork where I can check current status of ARM IOMMU
>> >> related patches ?
>> >>
>> >> And is this patch accepted, if not any comments / feedback ?
>> >
>> > Please be patient: the merge window is open and it's not been long since 
>> > you
>> > posted the patch, which looks pretty bonkers at first glance.
>> >
>> > Will
>>
>> Look at this
>> https://lkml.org/lkml/2017/4/3/605
>> The same thing, i pinged after a week and you said you already picked it up.
>> All I am asking is how do i know the current status, how many days
>> would normally
>> be considered being patient ?
>
> At least wait until the merge window is over if it's not a fix, or keep an
> eye on the relevant branches (see below).
>
>> Instead of troubling you, is there a patchwork where i can check the status ?
>
> No, but I pick patches up on my iommu/devel branch here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/
>
> and at some point they appear on for-joerg/arm-smmu/updates, which I send
> to Joerg (who is the iommu maintainer). He then puts them into linux-next
> before they get sent for inclusion in mainline during the next merge window.
>
> Will

Thanks for the info.

Sunil.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: AMD Ryzen KVM/NPT/IOMMU issue

2017-05-03 Thread Nick Sarnie
On Wed, May 3, 2017 at 10:37 AM, Matthias Ehrenfeuchter  wrote:
> Hi,
>
> There are a lot of messages/threads out there about bad performance while
> using AMDs Ryzen with KVM GPU passthrough. It revolves all on
> enabling/disabling npt, while enabled overall VM performance is nice but the
> GPU performance gives me about 20% (and a lot of drops to zero GPU usage,
> while CPU/Disk/Ram also doing nothing) compared to npt disabled. But while
> npt is disabled overall VM performance is like beeing on 4x86 with floppy
> disk as only storage. (Ex. it takes 2 seconds just to open startmenu while
> host and vm are in idle, and neither CPU pinning, changing CPU model,
> changing storage device nor using hugepages changed anything).
>
> So everything I read pointed to a bug in the npt implementation? Anything I
> could do to get closer to the "thing" issuing this?
>
> Best Regards
>
> efeu
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

I heard from Joerg that it might be related to a lower intercept rate
being used when NPT is enabled, but we haven't been able to find a way
to trace that to confirm.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-03 Thread Rob Herring
On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep  wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch serves following:
>
> 1) exposes interface to the pci host driver for their
> inbound memory ranges
>
> 2) provide an interface to callers such as of_dma_get_ranges.
> so then the returned size get best possible (largest) dma_mask.
> because PCI RC drivers do not call APIs such as
> dma_set_coherent_mask() and hence rather it shows its addressing
> capabilities based on dma-ranges.
> for e.g.
> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7f.
>
> 3) this patch handles multiple inbound windows and dma-ranges.
> it is left to the caller, how it wants to use them.
> the new function returns the resources in a standard and unform way
>
> 4) this way the callers of for e.g. of_dma_get_ranges
> does not need to change.
>
> 5) leaves scope of adding PCI flag handling for inbound memory
> by the new function.
>
> Bug: SOC-5216
> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
> Signed-off-by: Oza Pawandeep 
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
> Reviewed-by: vpx_checkpatch status 
> Reviewed-by: CCXSW 
> Reviewed-by: Ray Jui 
> Tested-by: vpx_autobuild status 
> Tested-by: vpx_smoketest status 
> Tested-by: CCXSW 

All these non-person, probably internal Broadcom Tested-by and
Reviewed-by's should be removed too.

> Reviewed-by: Scott Branden 
>

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges

2017-05-03 Thread Rob Herring
On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep  wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch fixes this patch fixes the bug in of_dma_get_range,
> which with as is, parses the PCI memory ranges and return wrong
> size as 0.
>
> in order to get largest possible dma_mask. this patch also
> retuns the largest possible size based on dma-ranges,
>
> for e.g.
> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7f.
>
> based on which iova allocation space will honour PCI host
> bridge limitations.
>
> Bug: SOC-5216
> Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
> Signed-off-by: Oza Pawandeep 
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
> Reviewed-by: vpx_checkpatch status 
> Reviewed-by: CCXSW 
> Reviewed-by: Scott Branden 
> Tested-by: vpx_autobuild status 
> Tested-by: vpx_smoketest status 
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 02b2903..f7734fc 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 
> *dma_addr, u64 *paddr, u64 *siz
> int ret = 0;
> u64 dmaaddr;
>
> +#ifdef CONFIG_PCI
> +   struct resource_entry *window;
> +   LIST_HEAD(res);
> +
> +   if (!node)
> +   return -EINVAL;
> +
> +   if (of_bus_pci_match(np)) {

You are not following what I'm saying. Let me spell it out:

- Add a get_dma_ranges() function to of_bus struct. Or maybe should
cover ranges too (e.g. get_ranges). I'm not sure.
- Convert existing contents of this function to
of_bus_default_dma_get_ranges and add that to the default of_bus
struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu