Re: [PATCH v7 16/16] drivers: acpi: iort: introduce iort_iommu_configure

2016-11-15 Thread Rafael J. Wysocki
On Wed, Nov 9, 2016 at 3:19 PM, Lorenzo Pieralisi
 wrote:
> DT based systems have a generic kernel API to configure IOMMUs
> for devices (ie of_iommu_configure()).
>
> On ARM based ACPI systems, the of_iommu_configure() equivalent can
> be implemented atop ACPI IORT kernel API, with the corresponding
> functions to map device identifiers to IOMMUs and retrieve the
> corresponding IOMMU operations necessary for DMA operations set-up.
>
> By relying on the iommu_fwspec generic kernel infrastructure,
> implement the IORT based IOMMU configuration for ARM ACPI systems
> and hook it up in the ACPI kernel layer that implements DMA
> configuration for a device.
>
> Signed-off-by: Lorenzo Pieralisi 
> Tested-by: Hanjun Guo 
> Tested-by: Tomasz Nowicki 
> Cc: Hanjun Guo 
> Cc: Tomasz Nowicki 
> Cc: "Rafael J. Wysocki" 

For the ACPI core part:

Acked-by: Rafael J. Wysocki 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 07/16] drivers: acpi: implement acpi_dma_configure

2016-11-15 Thread Rafael J. Wysocki
On Wed, Nov 9, 2016 at 3:19 PM, Lorenzo Pieralisi
 wrote:
> On DT based systems, the of_dma_configure() API implements DMA
> configuration for a given device. On ACPI systems an API equivalent to
> of_dma_configure() is missing which implies that it is currently not
> possible to set-up DMA operations for devices through the ACPI generic
> kernel layer.
>
> This patch fills the gap by introducing acpi_dma_configure/deconfigure()
> calls that for now are just wrappers around arch_setup_dma_ops() and
> arch_teardown_dma_ops() and also updates ACPI and PCI core code to use
> the newly introduced acpi_dma_configure/acpi_dma_deconfigure functions.
>
> Since acpi_dma_configure() is used to configure DMA operations, the
> function initializes the dma/coherent_dma masks to sane default values
> if the current masks are uninitialized (also to keep the default values
> consistent with DT systems) to make sure the device has a complete
> default DMA set-up.
>
> The DMA range size passed to arch_setup_dma_ops() is sized according
> to the device coherent_dma_mask (starting at address 0x0), mirroring the
> DT probing path behaviour when a dma-ranges property is not provided
> for the device being probed; this changes the current arch_setup_dma_ops()
> call parameters in the ACPI probing case, but since arch_setup_dma_ops()
> is a NOP on all architectures but ARM/ARM64 this patch does not change
> the current kernel behaviour on them.
>
> Signed-off-by: Lorenzo Pieralisi 
> Acked-by: Bjorn Helgaas  [pci]
> Tested-by: Hanjun Guo 
> Tested-by: Tomasz Nowicki 
> Cc: Bjorn Helgaas 
> Cc: Robin Murphy 
> Cc: Tomasz Nowicki 
> Cc: Joerg Roedel 
> Cc: "Rafael J. Wysocki" 

LGTM

Acked-by: Rafael J. Wysocki 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Tom Lendacky
On 11/15/2016 3:33 PM, Borislav Petkov wrote:
> On Tue, Nov 15, 2016 at 03:22:45PM -0600, Tom Lendacky wrote:
>> Hmmm... I still need the ebx value from the CPUID instruction to
>> calculate the proper reduction in physical bits, so I'll still need
>> to make the CPUID call.
> 
> if (c->extended_cpuid_level >= 0x801f) {
> cpuid(0x801f, , , , );
> 
>   ...
> 
> just like the rest of get_cpu_cap() :)

Right, which is what the code does now.  I was looking at switching
over to the cpu_has() function and eliminate the cpuid call, but I
still need the cpuid call for the ebx value.

Thanks,
Tom

> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Borislav Petkov
On Tue, Nov 15, 2016 at 03:22:45PM -0600, Tom Lendacky wrote:
> Hmmm... I still need the ebx value from the CPUID instruction to
> calculate the proper reduction in physical bits, so I'll still need
> to make the CPUID call.

if (c->extended_cpuid_level >= 0x801f) {
cpuid(0x801f, , , , );

...

just like the rest of get_cpu_cap() :)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Tom Lendacky
On 11/15/2016 6:14 AM, Borislav Petkov wrote:
> On Tue, Nov 15, 2016 at 01:10:35PM +0100, Joerg Roedel wrote:
>> Maybe add a comment here why you can't use cpu_has (yet).
> 
> So that could be alleviated by moving this function *after*
> init_scattered_cpuid_features(). Then you can simply do *cpu_has().
> 

Hmmm... I still need the ebx value from the CPUID instruction to
calculate the proper reduction in physical bits, so I'll still need
to make the CPUID call.

Thanks,
Tom

> Also, I'm not sure why we're checking CPUID for the SME feature when we
> have sme_get_me_mask() et al which have been setup much earlier...
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 13/20] x86: DMA support for memory encryption

2016-11-15 Thread Tom Lendacky
On 11/15/2016 12:17 PM, Radim Krčmář wrote:
> 2016-11-15 11:02-0600, Tom Lendacky:
>> On 11/15/2016 8:39 AM, Radim Krčmář wrote:
>>> 2016-11-09 18:37-0600, Tom Lendacky:
 Since DMA addresses will effectively look like 48-bit addresses when the
 memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
 device performing the DMA does not support 48-bits. SWIOTLB will be
 initialized to create un-encrypted bounce buffers for use by these devices.

 Signed-off-by: Tom Lendacky 
 ---
 diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
 @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
   *
   * This returns non-zero if we are forced to use swiotlb (by the boot
 - * option).
 + * option). If memory encryption is enabled then swiotlb will be set
 + * to 1 so that bounce buffers are allocated and used for devices that
 + * do not support the addressing range required for the encryption mask.
   */
  int __init pci_swiotlb_detect_override(void)
  {
int use_swiotlb = swiotlb | swiotlb_force;
  
 -  if (swiotlb_force)
 +  if (swiotlb_force || sme_me_mask)
swiotlb = 1;
  
return use_swiotlb;
>>>
>>> We want to return 1 even if only sme_me_mask is 1, because the return
>>> value is used for detection.  The following would be less obscure, IMO:
>>>
>>> if (swiotlb_force || sme_me_mask)
>>> swiotlb = 1;
>>>
>>> return swiotlb;
>>
>> If we do that then all DMA would go through the swiotlb bounce buffers.
> 
> No, that is decided for example in swiotlb_map_page() and we need to
> call pci_swiotlb_init() to register that function.
> 
>> By setting swiotlb to 1 we indicate that the bounce buffers will be
>> needed for those devices that can't support the addressing range when
>> the encryption bit is set (48 bit DMA). But if the device can support
>> the addressing range we won't use the bounce buffers.
> 
> If we return 0 here, then pci_swiotlb_init() will not be called =>
> dma_ops won't be set to swiotlb_dma_ops => we won't use bounce buffers.
> 

Ok, I see why this was working for me...  By setting swiotlb = 1 and
returning 0 it was continuing to the pci_swiotlb_detect_4gb table which
would return the current value of swiotlb, which is 1, and so the
swiotlb ops were setup.

So the change that you mentioned will work, thanks for pointing that out
and getting me to dig deeper on it.  I'll update the patch.

>>> We setup encrypted swiotlb and then decrypt it, but sometimes set it up
>>> decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
>>> directly?
>>
>> When swiotlb is allocated in swiotlb_init(), it is too early to make
>> use of the api to the change the page attributes. Because of this,
>> the callback to make those changes is needed.
> 
> Thanks. (I don't know page table setup enough to see a lesser evil. :])
> 
 @@ -541,7 +583,7 @@ static phys_addr_t
  map_single(struct device *hwdev, phys_addr_t phys, size_t size,
   enum dma_data_direction dir)
  {
 -  dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
 +  dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
>>>
>>> We have decrypted io_tlb_start before, so shouldn't its physical address
>>> be saved without the sme bit?  (Which changes a lot ...)
>>
>> I'm not sure what you mean here, can you elaborate a bit more?
> 
> The C-bit (sme bit) is a part of the physical address.

The C-bit (sme_me_mask) isn't part of the physical address for
io_tlb_start, but since the original call was to phys_to_dma(), which
now will automatically "or" in the C-bit, I needed to adjust that by
using swiotlb_phys_to_dma() to remove the C-bit.

> If we know that a certain physical page should be accessed as
> unencrypted (the bounce buffer) then the C-bit is 0.
> I'm wondering why we save the physical address with the C-bit set when
> we know that it can't be accessed that way (because we remove it every
> time).

It's not saved with the C-bit, but the phys_to_dma call will "or" in the
C-bit automatically.  And since this is common code I need to leave that
call to phys_to_dma in.

> 
> The naming is a bit confusing, because physical addresses are actually
> virtualized by SME -- maybe we should be calling them SME addresses?

Interesting idea...  I'll have to look at how that plays out in the
patches and documentation.

Thanks,
Tom

> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v3 13/20] x86: DMA support for memory encryption

2016-11-15 Thread Michael S. Tsirkin
On Tue, Nov 15, 2016 at 12:29:35PM -0600, Tom Lendacky wrote:
> On 11/15/2016 9:16 AM, Michael S. Tsirkin wrote:
> > On Wed, Nov 09, 2016 at 06:37:23PM -0600, Tom Lendacky wrote:
> >> Since DMA addresses will effectively look like 48-bit addresses when the
> >> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
> >> device performing the DMA does not support 48-bits. SWIOTLB will be
> >> initialized to create un-encrypted bounce buffers for use by these devices.
> >>
> >> Signed-off-by: Tom Lendacky 
> >> ---
> >>  arch/x86/include/asm/dma-mapping.h |5 ++-
> >>  arch/x86/include/asm/mem_encrypt.h |5 +++
> >>  arch/x86/kernel/pci-dma.c  |   11 ---
> >>  arch/x86/kernel/pci-nommu.c|2 +
> >>  arch/x86/kernel/pci-swiotlb.c  |8 -
> >>  arch/x86/mm/mem_encrypt.c  |   17 +++
> >>  include/linux/swiotlb.h|1 +
> >>  init/main.c|   13 
> >>  lib/swiotlb.c  |   58 
> >> +++-
> >>  9 files changed, 103 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/dma-mapping.h 
> >> b/arch/x86/include/asm/dma-mapping.h
> >> index 4446162..c9cdcae 100644
> >> --- a/arch/x86/include/asm/dma-mapping.h
> >> +++ b/arch/x86/include/asm/dma-mapping.h
> 
> ..SNIP...
> 
> >>  
> >> +/*
> >> + * If memory encryption is active, the DMA address for an encrypted page 
> >> may
> >> + * be beyond the range of the device. If bounce buffers are required be 
> >> sure
> >> + * that they are not on an encrypted page. This should be called before 
> >> the
> >> + * iotlb area is used.
> > 
> > Makes sense, but I think at least a dmesg warning here
> > might be a good idea.
> 
> Good idea.  Should it be a warning when it is first being set up or
> a warning the first time the bounce buffers need to be used.  Or maybe
> both?
> 
> > 
> > A boot flag that says "don't enable devices that don't support
> > encryption" might be a good idea, too, since most people
> > don't read dmesg output and won't notice the message.
> 
> I'll look into this. It might be something that can be checked as
> part of the device setting its DMA mask or the first time a DMA
> API is used if the device doesn't explicitly set its mask.
> 
> Thanks,
> Tom
> 
> > 

I think setup time is nicer if it's possible.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 13/20] x86: DMA support for memory encryption

2016-11-15 Thread Tom Lendacky
On 11/15/2016 9:16 AM, Michael S. Tsirkin wrote:
> On Wed, Nov 09, 2016 at 06:37:23PM -0600, Tom Lendacky wrote:
>> Since DMA addresses will effectively look like 48-bit addresses when the
>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>> device performing the DMA does not support 48-bits. SWIOTLB will be
>> initialized to create un-encrypted bounce buffers for use by these devices.
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>>  arch/x86/include/asm/dma-mapping.h |5 ++-
>>  arch/x86/include/asm/mem_encrypt.h |5 +++
>>  arch/x86/kernel/pci-dma.c  |   11 ---
>>  arch/x86/kernel/pci-nommu.c|2 +
>>  arch/x86/kernel/pci-swiotlb.c  |8 -
>>  arch/x86/mm/mem_encrypt.c  |   17 +++
>>  include/linux/swiotlb.h|1 +
>>  init/main.c|   13 
>>  lib/swiotlb.c  |   58 
>> +++-
>>  9 files changed, 103 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/dma-mapping.h 
>> b/arch/x86/include/asm/dma-mapping.h
>> index 4446162..c9cdcae 100644
>> --- a/arch/x86/include/asm/dma-mapping.h
>> +++ b/arch/x86/include/asm/dma-mapping.h

..SNIP...

>>  
>> +/*
>> + * If memory encryption is active, the DMA address for an encrypted page may
>> + * be beyond the range of the device. If bounce buffers are required be sure
>> + * that they are not on an encrypted page. This should be called before the
>> + * iotlb area is used.
> 
> Makes sense, but I think at least a dmesg warning here
> might be a good idea.

Good idea.  Should it be a warning when it is first being set up or
a warning the first time the bounce buffers need to be used.  Or maybe
both?

> 
> A boot flag that says "don't enable devices that don't support
> encryption" might be a good idea, too, since most people
> don't read dmesg output and won't notice the message.

I'll look into this. It might be something that can be checked as
part of the device setting its DMA mask or the first time a DMA
API is used if the device doesn't explicitly set its mask.

Thanks,
Tom

> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] arm64: SMMU-v2: Workaround for Cavium ThunderX erratum 28168

2016-11-15 Thread David Daney

On 11/15/2016 01:26 AM, Marc Zyngier wrote:

On 15/11/16 07:00, Geetha sowjanya wrote:

From: Tirumalesh Chalamarla 

   This patch implements Cavium ThunderX erratum 28168.

   PCI requires stores complete in order. Due to erratum #28168
   PCI-inbound MSI-X store to the interrupt controller are delivered
   to the interrupt controller before older PCI-inbound memory stores
   are committed.
   Doing a sync on SMMU will make sure all prior data transfers are
   completed before invoking ISR.

Signed-off-by: Tirumalesh Chalamarla 
Signed-off-by: Geetha sowjanya 

[...]

--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -28,6 +28,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 

  #include 
  #include 
@@ -736,6 +738,20 @@ static inline void gic_cpu_pm_init(void) { }

  #define GIC_ID_NR (1U << gic_data.rdists.id_bits)

+/*
+ * Due to #28168 erratum in ThunderX,
+ * we need to make sure DMA data transfer is done before MSIX.
+ */
+static void cavium_irq_perflow_handler(struct irq_data *data)
+{
+   struct pci_dev *pdev;
+
+   pdev = msi_desc_to_pci_dev(irq_data_get_msi_desc(data));


What happens if this is not a PCI device?


+   if ((pdev->vendor != 0x177d) &&
+   ((pdev->device & 0xA000) != 0xA000))
+   cavium_arm_smmu_tlb_sync(>dev);


I've asked that before. What makes Cavium devices so special that they
are not sensitive to this bug?



This is a heuristic for devices connected to external PCIe buses as 
opposed to on-SoC devices (which don't suffer from the erratum).


In any event what would happen if we got rid of this check and ...





+}
+
  static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,
  irq_hw_number_t hw)
  {
@@ -773,6 +789,9 @@ static int gic_irq_domain_map(struct irq_domain *d, 
unsigned int irq,
return -EPERM;
irq_domain_set_info(d, irq, hw, chip, d->host_data,
handle_fasteoi_irq, NULL, NULL);
+   if (cpus_have_cap(ARM64_WORKAROUND_CAVIUM_28168))
+   __irq_set_preflow_handler(irq,
+ cavium_irq_perflow_handler);




... move the registration of the preflow_handler into a 
msi_domain_ops.msi_finish() handler in irq-git-v3-its-pic-msi.c?


There we will know that it is a pci device, and can walk up the bus 
hierarchy to see if there is a Cavium PCIe root port present.  If such a 
port is found, we know we are on an external Cavium PCIe bus, and can 
register the preflow_handler without having to check the device identifiers.





What happens if SMMUv2 is not compiled in? Also, since this only affects
LPI signaling, why is this in the core GICv3 code and not in the ITS.
And more specifically, in the PCI part of the ITS, since you seem to
exclusively consider PCI?


}

return 0;



Thanks,

M.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 13/20] x86: DMA support for memory encryption

2016-11-15 Thread Radim Krčmář
2016-11-15 11:02-0600, Tom Lendacky:
> On 11/15/2016 8:39 AM, Radim Krčmář wrote:
>> 2016-11-09 18:37-0600, Tom Lendacky:
>>> Since DMA addresses will effectively look like 48-bit addresses when the
>>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>>> device performing the DMA does not support 48-bits. SWIOTLB will be
>>> initialized to create un-encrypted bounce buffers for use by these devices.
>>>
>>> Signed-off-by: Tom Lendacky 
>>> ---
>>> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
>>> @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
>>>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>>>   *
>>>   * This returns non-zero if we are forced to use swiotlb (by the boot
>>> - * option).
>>> + * option). If memory encryption is enabled then swiotlb will be set
>>> + * to 1 so that bounce buffers are allocated and used for devices that
>>> + * do not support the addressing range required for the encryption mask.
>>>   */
>>>  int __init pci_swiotlb_detect_override(void)
>>>  {
>>> int use_swiotlb = swiotlb | swiotlb_force;
>>>  
>>> -   if (swiotlb_force)
>>> +   if (swiotlb_force || sme_me_mask)
>>> swiotlb = 1;
>>>  
>>> return use_swiotlb;
>> 
>> We want to return 1 even if only sme_me_mask is 1, because the return
>> value is used for detection.  The following would be less obscure, IMO:
>> 
>>  if (swiotlb_force || sme_me_mask)
>>  swiotlb = 1;
>> 
>>  return swiotlb;
> 
> If we do that then all DMA would go through the swiotlb bounce buffers.

No, that is decided for example in swiotlb_map_page() and we need to
call pci_swiotlb_init() to register that function.

> By setting swiotlb to 1 we indicate that the bounce buffers will be
> needed for those devices that can't support the addressing range when
> the encryption bit is set (48 bit DMA). But if the device can support
> the addressing range we won't use the bounce buffers.

If we return 0 here, then pci_swiotlb_init() will not be called =>
dma_ops won't be set to swiotlb_dma_ops => we won't use bounce buffers.

>> We setup encrypted swiotlb and then decrypt it, but sometimes set it up
>> decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
>> directly?
> 
> When swiotlb is allocated in swiotlb_init(), it is too early to make
> use of the api to the change the page attributes. Because of this,
> the callback to make those changes is needed.

Thanks. (I don't know page table setup enough to see a lesser evil. :])

>>> @@ -541,7 +583,7 @@ static phys_addr_t
>>>  map_single(struct device *hwdev, phys_addr_t phys, size_t size,
>>>enum dma_data_direction dir)
>>>  {
>>> -   dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
>>> +   dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
>> 
>> We have decrypted io_tlb_start before, so shouldn't its physical address
>> be saved without the sme bit?  (Which changes a lot ...)
> 
> I'm not sure what you mean here, can you elaborate a bit more?

The C-bit (sme bit) is a part of the physical address.
If we know that a certain physical page should be accessed as
unencrypted (the bounce buffer) then the C-bit is 0.
I'm wondering why we save the physical address with the C-bit set when
we know that it can't be accessed that way (because we remove it every
time).

The naming is a bit confusing, because physical addresses are actually
virtualized by SME -- maybe we should be calling them SME addresses?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Tom Lendacky
On 11/15/2016 10:33 AM, Borislav Petkov wrote:
> On Tue, Nov 15, 2016 at 10:06:16AM -0600, Tom Lendacky wrote:
>> Yes, but that doesn't relate to the physical address space reduction.
>>
>> Once the SYS_CFG MSR bit for SME is set, even if the encryption bit is
>> never used, there is a physical reduction of the address space. So when
>> checking whether to adjust the physical address bits I can't rely on the
>> sme_me_mask, I have to look at the MSR.
>>
>> But when I'm looking to decide whether to encrypt or decrypt something,
>> I use the sme_me_mask to decide if that is needed.  If the sme_me_mask
>> is not set then the encrypt/decrypt op shouldn't be performed.
>>
>> I might not be grasping the point you're trying to make...
> 
> Ok, let me try to summarize how I see it. There are a couple of states:
> 
> * CPUID bit in 0x801f - that's SME supported
> 
> * Reduction of address space - MSR bit. That could be called "SME
> BIOS-eenabled".
> 
> * SME active. That's both of the above and is sme_me_mask != 0.
> 
> Right?

Correct.

> 
> So you said previously "The feature may be present and enabled even if
> it is not currently active."
> 
> But then you say "active" below
> 
>>> And in patch 12 you have:
>>>
>>> +   /*
>>> +* If memory encryption is active, the trampoline area will need to
>>> +* be in un-encrypted memory in order to bring up other processors
>>> +* successfully.
>>> +*/
>>> +   sme_early_mem_dec(__pa(base), size);
>>> +   sme_set_mem_unenc(base, size);
> 
> and test sme_me_mask. Which makes sense now after having explained which
> hw setting controls what.
> 
> So can we agree on the nomenclature for all the different SME states
> first and use those throughout the code? And hold those states down in
> Documentation/x86/amd-memory-encryption.txt so that it is perfectly
> clear to people looking at the code.

Yup, that sounds good.  I'll update the documentation to clarify the
various states/modes of SME.

> 
> Also, if we need to check those states more than once, we should add
> inline helpers:
> 
> sme_supported()
> sme_bios_enabled()
> sme_active()
> 
> How does that sound?

Sounds good.

Thanks,
Tom

> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 13/20] x86: DMA support for memory encryption

2016-11-15 Thread Tom Lendacky
On 11/15/2016 8:39 AM, Radim Krčmář wrote:
> 2016-11-09 18:37-0600, Tom Lendacky:
>> Since DMA addresses will effectively look like 48-bit addresses when the
>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>> device performing the DMA does not support 48-bits. SWIOTLB will be
>> initialized to create un-encrypted bounce buffers for use by these devices.
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>> diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
>> @@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, 
>> struct page *page,
>>   enum dma_data_direction dir,
>>   unsigned long attrs)
>>  {
>> -dma_addr_t bus = page_to_phys(page) + offset;
>> +dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
>>  WARN_ON(size == 0);
>>  if (!check_addr("map_single", dev, bus, size))
>>  return DMA_ERROR_CODE;
>> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
>> @@ -12,6 +12,8 @@
>>  int swiotlb __read_mostly;
>>  
>>  void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
>> @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
>>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>>   *
>>   * This returns non-zero if we are forced to use swiotlb (by the boot
>> - * option).
>> + * option). If memory encryption is enabled then swiotlb will be set
>> + * to 1 so that bounce buffers are allocated and used for devices that
>> + * do not support the addressing range required for the encryption mask.
>>   */
>>  int __init pci_swiotlb_detect_override(void)
>>  {
>>  int use_swiotlb = swiotlb | swiotlb_force;
>>  
>> -if (swiotlb_force)
>> +if (swiotlb_force || sme_me_mask)
>>  swiotlb = 1;
>>  
>>  return use_swiotlb;
> 
> We want to return 1 even if only sme_me_mask is 1, because the return
> value is used for detection.  The following would be less obscure, IMO:
> 
>   if (swiotlb_force || sme_me_mask)
>   swiotlb = 1;
> 
>   return swiotlb;

If we do that then all DMA would go through the swiotlb bounce buffers.
By setting swiotlb to 1 we indicate that the bounce buffers will be
needed for those devices that can't support the addressing range when
the encryption bit is set (48 bit DMA). But if the device can support
the addressing range we won't use the bounce buffers.

> 
>> diff --git a/init/main.c b/init/main.c
>> @@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void)
>>   */
>>  locking_selftest();
>>  
>> +/*
>> + * This needs to be called before any devices perform DMA
>> + * operations that might use the swiotlb bounce buffers.
>> + * This call will mark the bounce buffers as un-encrypted so
>> + * that their usage will not cause "plain-text" data to be
>> + * decrypted when accessed.
>> + */
>> +mem_encrypt_init();
> 
> (Comments below are connected to the reason why we call this.)
> 
>> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
>> @@ -159,6 +171,31 @@ void swiotlb_print_info(void)
>> +/*
>> + * If memory encryption is active, the DMA address for an encrypted page may
>> + * be beyond the range of the device. If bounce buffers are required be sure
>> + * that they are not on an encrypted page. This should be called before the
>> + * iotlb area is used.
>> + */
>> +void __init swiotlb_clear_encryption(void)
>> +{
>> +void *vaddr;
>> +unsigned long bytes;
>> +
>> +if (no_iotlb_memory || !io_tlb_start || late_alloc)
> 
> io_tlb_start seems redundant -- when can !no_iotlb_memory &&
> !io_tlb_start happen?

Yes, the io_tlb_start check can be removed.

> 
> Is the order of calls
>   1) swiotlb init
>   2) SME init
>   3) swiotlb late init 
> ?

Yes, sort of.  The swiotlb late init may not be called.

> 
> We setup encrypted swiotlb and then decrypt it, but sometimes set it up
> decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
> directly?

When swiotlb is allocated in swiotlb_init(), it is too early to make
use of the api to the change the page attributes. Because of this,
the callback to make those changes is needed.

> 
>> +return;
>> +
>> +vaddr = phys_to_virt(io_tlb_start);
>> +bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
>> +swiotlb_set_mem_unenc(vaddr, bytes);
>> +memset(vaddr, 0, bytes);
>> +
>> +vaddr = phys_to_virt(io_tlb_overflow_buffer);
>> +bytes = PAGE_ALIGN(io_tlb_overflow);
>> +swiotlb_set_mem_unenc(vaddr, bytes);
>> +memset(vaddr, 0, bytes);
>> +}
>> +
>> @@ -541,7 +583,7 @@ static phys_addr_t
>>  map_single(struct device *hwdev, phys_addr_t phys, size_t size,
>> enum dma_data_direction dir)
>>  {
>> -dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
>> +dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
> 
> We have decrypted 

Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Borislav Petkov
On Tue, Nov 15, 2016 at 10:06:16AM -0600, Tom Lendacky wrote:
> Yes, but that doesn't relate to the physical address space reduction.
> 
> Once the SYS_CFG MSR bit for SME is set, even if the encryption bit is
> never used, there is a physical reduction of the address space. So when
> checking whether to adjust the physical address bits I can't rely on the
> sme_me_mask, I have to look at the MSR.
> 
> But when I'm looking to decide whether to encrypt or decrypt something,
> I use the sme_me_mask to decide if that is needed.  If the sme_me_mask
> is not set then the encrypt/decrypt op shouldn't be performed.
> 
> I might not be grasping the point you're trying to make...

Ok, let me try to summarize how I see it. There are a couple of states:

* CPUID bit in 0x801f - that's SME supported

* Reduction of address space - MSR bit. That could be called "SME
BIOS-eenabled".

* SME active. That's both of the above and is sme_me_mask != 0.

Right?

So you said previously "The feature may be present and enabled even if
it is not currently active."

But then you say "active" below

> > And in patch 12 you have:
> > 
> > +   /*
> > +* If memory encryption is active, the trampoline area will need to
> > +* be in un-encrypted memory in order to bring up other processors
> > +* successfully.
> > +*/
> > +   sme_early_mem_dec(__pa(base), size);
> > +   sme_set_mem_unenc(base, size);

and test sme_me_mask. Which makes sense now after having explained which
hw setting controls what.

So can we agree on the nomenclature for all the different SME states
first and use those throughout the code? And hold those states down in
Documentation/x86/amd-memory-encryption.txt so that it is perfectly
clear to people looking at the code.

Also, if we need to check those states more than once, we should add
inline helpers:

sme_supported()
sme_bios_enabled()
sme_active()

How does that sound?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Stop getting dma_32bit_pfn wrong

2016-11-15 Thread Robin Murphy
On 15/11/16 11:49, Joerg Roedel wrote:
> On Fri, Nov 11, 2016 at 06:30:45PM +, Robin Murphy wrote:
>> iommu_dma_init_domain() was originally written under the misconception
>> that dma_32bit_pfn represented some sort of size limit for IOVA domains.
>> Since the truth is almost the exact opposite of that, rework the logic
>> and comments to reflect its real purpose of optimising lookups when
>> allocating from a subset of the available space.
>>
>> Signed-off-by: Robin Murphy 
>> ---
>>  drivers/iommu/dma-iommu.c | 23 ++-
>>  1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab8667e6f2..ae045a14b530 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -139,6 +139,7 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
>> dma_addr_t base,
>>  {
>>  struct iova_domain *iovad = cookie_iovad(domain);
>>  unsigned long order, base_pfn, end_pfn;
>> +bool pci = dev && dev_is_pci(dev);
>>  
>>  if (!iovad)
>>  return -ENODEV;
>> @@ -161,19 +162,31 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
>> dma_addr_t base,
>>  end_pfn = min_t(unsigned long, end_pfn,
>>  domain->geometry.aperture_end >> order);
>>  }
>> +/*
>> + * PCI devices may have larger DMA masks, but still prefer allocating
>> + * within a 32-bit mask to avoid DAC addressing. Such limitations don't
>> + * apply to the typical platform device, so for those we may as well
>> + * leave the cache limit at the top of the range they're likely to use.
>> + */
>> +if (pci)
>> +end_pfn = min_t(unsigned long, end_pfn,
>> +DMA_BIT_MASK(32) >> order);
> 
> Question, does it actually hurt platform devices to follow the same
> allocation strategy as pci devices? I mean, does it hurt enough to
> special-case the code here?

It hurts them in the sense that they get absolutely no benefit from
trying a lower limit first (the device simply has as many address lines
wired to the interconnect as it needs/can drive), therefore there's
nothing to offset the cost of every allocation becoming
try-fail-and-try-again once <32-bits fills up with, say, big GPU
mappings. The maintenance cost of a couple of one-liner if statements
doesn't seem big enough to prevent a significant set of devices - bear
in mind that the first products really using this code in anger don't
have PCI at all (MT8173) - from having straightforward and optimal
allocation behaviour all the time, every time.

I suppose it might bear saying that as a Cambridge Water customer, I do
tend to view PCI in the same light as USB/I2C/etc. as "just another
external bus controller on the outside edge of 'the system'". From that
perspective, avoiding DAC overhead really does look like the minority
special case, although I appreciate that the view from the x86 angle is
rather different. I also suspect that, whether we want to or not, we may
continue to see more on-chip 'PCI' devices which don't really need this
either (because they're really just 'platform' devices talking directly
to whatever coherent interconnect with a config space bolted on somewhere).

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Tom Lendacky
On 11/15/2016 9:33 AM, Borislav Petkov wrote:
> On Tue, Nov 15, 2016 at 08:40:05AM -0600, Tom Lendacky wrote:
>> The feature may be present and enabled even if it is not currently
>> active.  In other words, the SYS_CFG MSR bit could be set but we aren't
>> actually using encryption (sme_me_mask is 0).  As long as the SYS_CFG
>> MSR bit is set we need to take into account the physical reduction in
>> address space.
> 
> But later in the series I see sme_early_mem_enc() which tests exactly
> that mask.

Yes, but that doesn't relate to the physical address space reduction.

Once the SYS_CFG MSR bit for SME is set, even if the encryption bit is
never used, there is a physical reduction of the address space. So when
checking whether to adjust the physical address bits I can't rely on the
sme_me_mask, I have to look at the MSR.

But when I'm looking to decide whether to encrypt or decrypt something,
I use the sme_me_mask to decide if that is needed.  If the sme_me_mask
is not set then the encrypt/decrypt op shouldn't be performed.

I might not be grasping the point you're trying to make...

Thanks,
Tom

> 
> And in patch 12 you have:
> 
> +   /*
> +* If memory encryption is active, the trampoline area will need to
> +* be in un-encrypted memory in order to bring up other processors
> +* successfully.
> +*/
> +   sme_early_mem_dec(__pa(base), size);
> +   sme_set_mem_unenc(base, size);
> 
> What's up?
> 
> IOW, it all sounds to me like you want to have an sme_active() helper
> and use it everywhere.
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Borislav Petkov
On Tue, Nov 15, 2016 at 08:40:05AM -0600, Tom Lendacky wrote:
> The feature may be present and enabled even if it is not currently
> active.  In other words, the SYS_CFG MSR bit could be set but we aren't
> actually using encryption (sme_me_mask is 0).  As long as the SYS_CFG
> MSR bit is set we need to take into account the physical reduction in
> address space.

But later in the series I see sme_early_mem_enc() which tests exactly
that mask.

And in patch 12 you have:

+   /*
+* If memory encryption is active, the trampoline area will need to
+* be in un-encrypted memory in order to bring up other processors
+* successfully.
+*/
+   sme_early_mem_dec(__pa(base), size);
+   sme_set_mem_unenc(base, size);

What's up?

IOW, it all sounds to me like you want to have an sme_active() helper
and use it everywhere.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 13/20] x86: DMA support for memory encryption

2016-11-15 Thread Michael S. Tsirkin
On Wed, Nov 09, 2016 at 06:37:23PM -0600, Tom Lendacky wrote:
> Since DMA addresses will effectively look like 48-bit addresses when the
> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
> device performing the DMA does not support 48-bits. SWIOTLB will be
> initialized to create un-encrypted bounce buffers for use by these devices.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/dma-mapping.h |5 ++-
>  arch/x86/include/asm/mem_encrypt.h |5 +++
>  arch/x86/kernel/pci-dma.c  |   11 ---
>  arch/x86/kernel/pci-nommu.c|2 +
>  arch/x86/kernel/pci-swiotlb.c  |8 -
>  arch/x86/mm/mem_encrypt.c  |   17 +++
>  include/linux/swiotlb.h|1 +
>  init/main.c|   13 
>  lib/swiotlb.c  |   58 
> +++-
>  9 files changed, 103 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/include/asm/dma-mapping.h 
> b/arch/x86/include/asm/dma-mapping.h
> index 4446162..c9cdcae 100644
> --- a/arch/x86/include/asm/dma-mapping.h
> +++ b/arch/x86/include/asm/dma-mapping.h
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_ISA
>  # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
> @@ -69,12 +70,12 @@ static inline bool dma_capable(struct device *dev, 
> dma_addr_t addr, size_t size)
>  
>  static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
>  {
> - return paddr;
> + return paddr | sme_me_mask;
>  }
>  
>  static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
>  {
> - return daddr;
> + return daddr & ~sme_me_mask;
>  }
>  #endif /* CONFIG_X86_DMA_REMAP */
>  
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index d544481..a024451 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -35,6 +35,11 @@ void __init sme_encrypt_ramdisk(resource_size_t paddr,
>  
>  void __init sme_early_init(void);
>  
> +/* Architecture __weak replacement functions */
> +void __init mem_encrypt_init(void);
> +
> +void swiotlb_set_mem_unenc(void *vaddr, unsigned long size);
> +
>  #define __sme_pa(x)  (__pa((x)) | sme_me_mask)
>  #define __sme_pa_nodebug(x)  (__pa_nodebug((x)) | sme_me_mask)
>  
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index d30c377..0ce28df 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -92,9 +92,12 @@ again:
>   /* CMA can be used only in the context which permits sleeping */
>   if (gfpflags_allow_blocking(flag)) {
>   page = dma_alloc_from_contiguous(dev, count, get_order(size));
> - if (page && page_to_phys(page) + size > dma_mask) {
> - dma_release_from_contiguous(dev, page, count);
> - page = NULL;
> + if (page) {
> + addr = phys_to_dma(dev, page_to_phys(page));
> + if (addr + size > dma_mask) {
> + dma_release_from_contiguous(dev, page, count);
> + page = NULL;
> + }
>   }
>   }
>   /* fallback */
> @@ -103,7 +106,7 @@ again:
>   if (!page)
>   return NULL;
>  
> - addr = page_to_phys(page);
> + addr = phys_to_dma(dev, page_to_phys(page));
>   if (addr + size > dma_mask) {
>   __free_pages(page, get_order(size));
>  
> diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
> index 00e71ce..922c10d 100644
> --- a/arch/x86/kernel/pci-nommu.c
> +++ b/arch/x86/kernel/pci-nommu.c
> @@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct 
> page *page,
>enum dma_data_direction dir,
>unsigned long attrs)
>  {
> - dma_addr_t bus = page_to_phys(page) + offset;
> + dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
>   WARN_ON(size == 0);
>   if (!check_addr("map_single", dev, bus, size))
>   return DMA_ERROR_CODE;
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index b47edb8..34a9e524 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -12,6 +12,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
>  int swiotlb __read_mostly;
>  
>  void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>   *
>   * This returns non-zero if we are forced to use swiotlb (by the boot
> - * option).
> + * option). If memory encryption is enabled then swiotlb will be set
> + * to 1 so that bounce buffers are allocated and used for devices that
> + * do not support the addressing 

Re: [RFC v2 3/8] iommu/dma: Allow MSI-only cookies

2016-11-15 Thread Robin Murphy
On 14/11/16 23:23, Auger Eric wrote:
> Hi Robin,
> 
> On 14/11/2016 13:36, Robin Murphy wrote:
>> On 04/11/16 11:24, Eric Auger wrote:
>>> From: Robin Murphy 
>>>
>>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>>> with regard to mapping MSI messages in systems where the MSI write is
>>> subject to IOMMU translation. With the relevant infrastructure now in
>>> place for managed DMA domains, it's actually really simple for other
>>> users to piggyback off that and reap the benefits without giving up
>>> their own IOVA management, and without having to reinvent their own
>>> wheel in the MSI layer.
>>>
>>> Allow such users to opt into automatic MSI remapping by dedicating a
>>> region of their IOVA space to a managed cookie.
>>>
>>> Signed-off-by: Robin Murphy 
>>> Signed-off-by: Eric Auger 
>>
>> OK, following the discussion elsewhere I've had a go at the less stupid,
>> but more involved, version. Thoughts?
> 
> Conceptually I don't have any major objection with the minimalist
> allocation scheme all the more so it follows Joerg's guidance. Maybe the
> only thing is we do not check we don't overshoot the reserved msi-region.

Yes, I thought about that and came to the conclusion that it was hard to
justify the extra complexity. Since the caller has to calculate an
appropriate region size to reserve anyway, we might as well just trust
it to be correct. And if the caller did get things wrong, then one or
other iommu_map() is going to fail on the overlapping IOVAs anyway.

> 
> Besides there are 2 issues reported below.
> 
>>
>> Robin.
>>
>> ->8-
>> From: Robin Murphy 
>> Subject: [RFC PATCH] iommu/dma: Allow MSI-only cookies
>>
>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>> with regard to mapping MSI messages in systems where the MSI write is
>> subject to IOMMU translation. With the relevant infrastructure now in
>> place for managed DMA domains, it's actually really simple for other
>> users to piggyback off that and reap the benefits without giving up
>> their own IOVA management, and without having to reinvent their own
>> wheel in the MSI layer.
>>
>> Allow such users to opt into automatic MSI remapping by dedicating a
>> region of their IOVA space to a managed cookie, and extend the mapping
>> routine to implement a trivial linear allocator in such cases, to avoid
>> the needless overhead of a full-blown IOVA domain.
>>
>> Signed-off-by: Robin Murphy 
>> ---
>>  drivers/iommu/dma-iommu.c | 118 
>> --
>>  include/linux/dma-iommu.h |   6 +++
>>  2 files changed, 100 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab8667e6f2..33d66a8273c6 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -37,10 +37,19 @@ struct iommu_dma_msi_page {
>>  phys_addr_t phys;
>>  };
>>  
>> +enum iommu_dma_cookie_type {
>> +IOMMU_DMA_IOVA_COOKIE,
>> +IOMMU_DMA_MSI_COOKIE,
>> +};
>> +
>>  struct iommu_dma_cookie {
>> -struct iova_domain  iovad;
>> -struct list_headmsi_page_list;
>> -spinlock_t  msi_lock;
>> +union {
>> +struct iova_domain  iovad;
>> +dma_addr_t  msi_iova;
>> +};
>> +struct list_headmsi_page_list;
>> +spinlock_t  msi_lock;
>> +enum iommu_dma_cookie_type  type;
>>  };
>>  
>>  static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
>> @@ -53,6 +62,19 @@ int iommu_dma_init(void)
>>  return iova_cache_get();
>>  }
>>  
>> +static struct iommu_dma_cookie *__cookie_alloc(enum iommu_dma_cookie_type 
>> type)
>> +{
>> +struct iommu_dma_cookie *cookie;
>> +
>> +cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
>> +if (cookie) {
>> +spin_lock_init(>msi_lock);
>> +INIT_LIST_HEAD(>msi_page_list);
>> +cookie->type = type;
>> +}
>> +return cookie;
>> +}
>> +
>>  /**
>>   * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
>>   * @domain: IOMMU domain to prepare for DMA-API usage
>> @@ -62,25 +84,53 @@ int iommu_dma_init(void)
>>   */
>>  int iommu_get_dma_cookie(struct iommu_domain *domain)
>>  {
>> -struct iommu_dma_cookie *cookie;
>> -
>>  if (domain->iova_cookie)
>>  return -EEXIST;
>>  
>> -cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
>> -if (!cookie)
>> +domain->iova_cookie = __cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
>> +if (!domain->iova_cookie)
>>  return -ENOMEM;
>>  
>> -spin_lock_init(>msi_lock);
>> -INIT_LIST_HEAD(>msi_page_list);
>> -domain->iova_cookie = cookie;
>>  return 0;
>>  }
>>  EXPORT_SYMBOL(iommu_get_dma_cookie);
>>  
>>  /**
>> + * iommu_get_msi_cookie - Acquire just MSI remapping 

Re: [RFC PATCH v3 13/20] x86: DMA support for memory encryption

2016-11-15 Thread Radim Krčmář
2016-11-09 18:37-0600, Tom Lendacky:
> Since DMA addresses will effectively look like 48-bit addresses when the
> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
> device performing the DMA does not support 48-bits. SWIOTLB will be
> initialized to create un-encrypted bounce buffers for use by these devices.
> 
> Signed-off-by: Tom Lendacky 
> ---
> diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
> @@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct 
> page *page,
>enum dma_data_direction dir,
>unsigned long attrs)
>  {
> - dma_addr_t bus = page_to_phys(page) + offset;
> + dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
>   WARN_ON(size == 0);
>   if (!check_addr("map_single", dev, bus, size))
>   return DMA_ERROR_CODE;
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> @@ -12,6 +12,8 @@
>  int swiotlb __read_mostly;
>  
>  void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>   *
>   * This returns non-zero if we are forced to use swiotlb (by the boot
> - * option).
> + * option). If memory encryption is enabled then swiotlb will be set
> + * to 1 so that bounce buffers are allocated and used for devices that
> + * do not support the addressing range required for the encryption mask.
>   */
>  int __init pci_swiotlb_detect_override(void)
>  {
>   int use_swiotlb = swiotlb | swiotlb_force;
>  
> - if (swiotlb_force)
> + if (swiotlb_force || sme_me_mask)
>   swiotlb = 1;
>  
>   return use_swiotlb;

We want to return 1 even if only sme_me_mask is 1, because the return
value is used for detection.  The following would be less obscure, IMO:

if (swiotlb_force || sme_me_mask)
swiotlb = 1;

return swiotlb;

> diff --git a/init/main.c b/init/main.c
> @@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void)
>*/
>   locking_selftest();
>  
> + /*
> +  * This needs to be called before any devices perform DMA
> +  * operations that might use the swiotlb bounce buffers.
> +  * This call will mark the bounce buffers as un-encrypted so
> +  * that their usage will not cause "plain-text" data to be
> +  * decrypted when accessed.
> +  */
> + mem_encrypt_init();

(Comments below are connected to the reason why we call this.)

> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> @@ -159,6 +171,31 @@ void swiotlb_print_info(void)
> +/*
> + * If memory encryption is active, the DMA address for an encrypted page may
> + * be beyond the range of the device. If bounce buffers are required be sure
> + * that they are not on an encrypted page. This should be called before the
> + * iotlb area is used.
> + */
> +void __init swiotlb_clear_encryption(void)
> +{
> + void *vaddr;
> + unsigned long bytes;
> +
> + if (no_iotlb_memory || !io_tlb_start || late_alloc)

io_tlb_start seems redundant -- when can !no_iotlb_memory &&
!io_tlb_start happen?

Is the order of calls
  1) swiotlb init
  2) SME init
  3) swiotlb late init 
?

We setup encrypted swiotlb and then decrypt it, but sometimes set it up
decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
directly?

> + return;
> +
> + vaddr = phys_to_virt(io_tlb_start);
> + bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
> + swiotlb_set_mem_unenc(vaddr, bytes);
> + memset(vaddr, 0, bytes);
> +
> + vaddr = phys_to_virt(io_tlb_overflow_buffer);
> + bytes = PAGE_ALIGN(io_tlb_overflow);
> + swiotlb_set_mem_unenc(vaddr, bytes);
> + memset(vaddr, 0, bytes);
> +}
> +
> @@ -541,7 +583,7 @@ static phys_addr_t
>  map_single(struct device *hwdev, phys_addr_t phys, size_t size,
>  enum dma_data_direction dir)
>  {
> - dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
> + dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);

We have decrypted io_tlb_start before, so shouldn't its physical address
be saved without the sme bit?  (Which changes a lot ...)

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Tom Lendacky
On 11/15/2016 6:14 AM, Borislav Petkov wrote:
> On Tue, Nov 15, 2016 at 01:10:35PM +0100, Joerg Roedel wrote:
>> Maybe add a comment here why you can't use cpu_has (yet).
> 
> So that could be alleviated by moving this function *after*
> init_scattered_cpuid_features(). Then you can simply do *cpu_has().

Yes, I can move it after init_scattered_cpuid_features() and then use
the cpu_has() function.  I'll make sure to include a comment that the
function needs to be called after init_scattered_cpuid_features().

> 
> Also, I'm not sure why we're checking CPUID for the SME feature when we
> have sme_get_me_mask() et al which have been setup much earlier...
> 

The feature may be present and enabled even if it is not currently
active.  In other words, the SYS_CFG MSR bit could be set but we aren't
actually using encryption (sme_me_mask is 0).  As long as the SYS_CFG
MSR bit is set we need to take into account the physical reduction in
address space.

Thanks,
Tom
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Tom Lendacky
On 11/15/2016 6:10 AM, Joerg Roedel wrote:
> On Wed, Nov 09, 2016 at 06:35:13PM -0600, Tom Lendacky wrote:
>> +/*
>> + * AMD Secure Memory Encryption (SME) can reduce the size of the physical
>> + * address space if it is enabled, even if memory encryption is not active.
>> + * Adjust x86_phys_bits if SME is enabled.
>> + */
>> +static void phys_bits_adjust(struct cpuinfo_x86 *c)
>> +{
> 
> Better call this function amd_sme_phys_bits_adjust(). This name makes it
> clear at the call-site why it is there and what it does.

Will do.

> 
>> +u32 eax, ebx, ecx, edx;
>> +u64 msr;
>> +
>> +if (c->x86_vendor != X86_VENDOR_AMD)
>> +return;
>> +
>> +if (c->extended_cpuid_level < 0x801f)
>> +return;
>> +
>> +/* Check for SME feature */
>> +cpuid(0x801f, , , , );
>> +if (!(eax & 0x01))
>> +return;
> 
> Maybe add a comment here why you can't use cpu_has (yet).
> 

Ok, will do.

Thanks,
Tom

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] ACPI IORT ARM SMMU support

2016-11-15 Thread Lorenzo Pieralisi
On Tue, Nov 15, 2016 at 02:04:09PM +0100, Rafael J. Wysocki wrote:
> On Tue, Nov 15, 2016 at 11:12 AM, Lorenzo Pieralisi
>  wrote:
> > Hi Rafael,
> >
> > On Thu, Nov 10, 2016 at 12:36:12AM +0100, Rafael J. Wysocki wrote:
> >> Hi Lorenzo,
> >>
> >> On Wed, Nov 9, 2016 at 3:19 PM, Lorenzo Pieralisi
> >>  wrote:
> >> > This patch series is v7 of a previous posting:
> >> >
> >> > https://lkml.org/lkml/2016/10/18/506
> >>
> >> I don't see anything objectionable in this series.
> >>
> >> Please let me know which patches in particular to look at in detail.
> >
> > Are patches 7 and consequently 16 (that builds on top of 7) ok with
> > you ? They should be equivalent to nop on anything other than ARM64
> > but they touch ACPI core so they require an ACK if they are ok please.
> 
> Yes, they are.  Please feel free to add my ACKs to those or I will
> send them separately later today.

Thank you very much, I will add them to v8 that should be final,
I will send it when the dust has settled on patch 4, which looks
like the last bit to sort out (we may not even need a v8 at all).

Thanks !
Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2016-11-15 Thread Eric Auger
Following LPC discussions, we now report reserved regions through
iommu-group sysfs reserved_regions attribute file.

Reserved regions are populated through the IOMMU get_resv_region callback
(former get_dm_regions), now implemented by amd-iommu, intel-iommu and
arm-smmu.

The intel-iommu reports the [FEE0_h - FEF0_000h] MSI window as an
IOMMU_RESV_NOMAP reserved region.

arm-smmu reports the MSI window (arbitrarily located at 0x800 and
1MB large) and the PCI host bridge windows.

The series integrates a not officially posted patch from Robin:
"iommu/dma: Allow MSI-only cookies".

This series currently does not address IRQ safety assessment.

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3

History:
RFC v2 -> v3:
- switch to an iommu-group sysfs API
- use new dummy allocator provided by Robin
- dummy allocator initialized by vfio-iommu-type1 after enumerating
  the reserved regions
- at the moment ARM MSI base address/size is left unchanged compared
  to v2
- we currently report reserved regions and not usable IOVA regions as
  requested by Alex

RFC v1 -> v2:
- fix intel_add_reserved_regions
- add mutex lock/unlock in vfio_iommu_type1


Eric Auger (10):
  iommu/dma: Allow MSI-only cookies
  iommu: Rename iommu_dm_regions into iommu_resv_regions
  iommu: Add new reserved IOMMU attributes
  iommu: iommu_alloc_resv_region
  iommu: Do not map reserved regions
  iommu: iommu_get_group_resv_regions
  iommu: Implement reserved_regions iommu-group sysfs file
  iommu/vt-d: Implement reserved region get/put callbacks
  iommu/arm-smmu: Implement reserved region get/put callbacks
  vfio/type1: Get MSI cookie

 drivers/iommu/amd_iommu.c   |  20 +++---
 drivers/iommu/arm-smmu.c|  52 +++
 drivers/iommu/dma-iommu.c   | 116 ++---
 drivers/iommu/intel-iommu.c |  50 ++
 drivers/iommu/iommu.c   | 141 
 drivers/vfio/vfio_iommu_type1.c |  26 
 include/linux/dma-iommu.h   |   7 ++
 include/linux/iommu.h   |  49 ++
 8 files changed, 391 insertions(+), 70 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v3 10/10] vfio/type1: Get MSI cookie

2016-11-15 Thread Eric Auger
When attaching a group to the container, handle the group's
reserved regions and particularly the IOMMU_RESV_MSI region
which requires an IOVA allocator to be initialized through
the iommu_get_msi_cookie API. This will allow the MSI IOVAs
to be transparently allocated on MSI controller's compose().

Signed-off-by: Eric Auger 
---
 drivers/vfio/vfio_iommu_type1.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba1942..701d8a8 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson "
@@ -734,6 +735,27 @@ static void vfio_test_domain_fgsp(struct vfio_domain 
*domain)
__free_pages(pages, order);
 }
 
+static int vfio_iommu_handle_resv_regions(struct iommu_domain *domain,
+ struct iommu_group *group)
+{
+   struct list_head group_resv_regions;
+   struct iommu_resv_region *region, *next;
+   int ret = 0;
+
+   INIT_LIST_HEAD(_resv_regions);
+   iommu_get_group_resv_regions(group, _resv_regions);
+   list_for_each_entry(region, _resv_regions, list) {
+   if (region->prot & IOMMU_RESV_MSI) {
+   ret = iommu_get_msi_cookie(domain, region->start);
+   if (ret)
+   break;
+   }
+   }
+   list_for_each_entry_safe(region, next, _resv_regions, list)
+   kfree(region);
+   return ret;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 struct iommu_group *iommu_group)
 {
@@ -834,6 +856,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (ret)
goto out_detach;
 
+   ret = vfio_iommu_handle_resv_regions(domain->domain, iommu_group);
+   if (ret)
+   goto out_detach;
+
list_add(>next, >domain_list);
 
mutex_unlock(>lock);
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v3 01/10] iommu/dma: Allow MSI-only cookies

2016-11-15 Thread Eric Auger
IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.

Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie, and extend the mapping
routine to implement a trivial linear allocator in such cases, to avoid
the needless overhead of a full-blown IOVA domain.

Signed-off-by: Robin Murphy 
Signed-off-by: Eric Auger 

---
[Eric] fixed 2 issues on MSI path
---
 drivers/iommu/dma-iommu.c | 116 +-
 include/linux/dma-iommu.h |   7 +++
 2 files changed, 101 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab866..793103f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -37,10 +37,19 @@ struct iommu_dma_msi_page {
phys_addr_t phys;
 };
 
+enum iommu_dma_cookie_type {
+   IOMMU_DMA_IOVA_COOKIE,
+   IOMMU_DMA_MSI_COOKIE,
+};
+
 struct iommu_dma_cookie {
-   struct iova_domain  iovad;
-   struct list_headmsi_page_list;
-   spinlock_t  msi_lock;
+   union {
+   struct iova_domain  iovad;
+   dma_addr_t  msi_iova;
+   };
+   struct list_headmsi_page_list;
+   spinlock_t  msi_lock;
+   enum iommu_dma_cookie_type  type;
 };
 
 static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
@@ -53,6 +62,19 @@ int iommu_dma_init(void)
return iova_cache_get();
 }
 
+static struct iommu_dma_cookie *__cookie_alloc(enum iommu_dma_cookie_type type)
+{
+   struct iommu_dma_cookie *cookie;
+
+   cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+   if (cookie) {
+   spin_lock_init(>msi_lock);
+   INIT_LIST_HEAD(>msi_page_list);
+   cookie->type = type;
+   }
+   return cookie;
+}
+
 /**
  * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
  * @domain: IOMMU domain to prepare for DMA-API usage
@@ -62,25 +84,53 @@ int iommu_dma_init(void)
  */
 int iommu_get_dma_cookie(struct iommu_domain *domain)
 {
+   if (domain->iova_cookie)
+   return -EEXIST;
+
+   domain->iova_cookie = __cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
+   if (!domain->iova_cookie)
+   return -ENOMEM;
+
+   return 0;
+}
+EXPORT_SYMBOL(iommu_get_dma_cookie);
+
+/**
+ * iommu_get_msi_cookie - Acquire just MSI remapping resources
+ * @domain: IOMMU domain to prepare
+ * @base: Start address of IOVA region for MSI mappings
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately. Users should reserve a
+ * contiguous IOVA region, starting at @base, large enough to accommodate the
+ * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
+ * used by the devices attached to @domain.
+ */
+int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
+{
struct iommu_dma_cookie *cookie;
 
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
+
if (domain->iova_cookie)
return -EEXIST;
 
-   cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+   cookie = __cookie_alloc(IOMMU_DMA_MSI_COOKIE);
if (!cookie)
return -ENOMEM;
 
-   spin_lock_init(>msi_lock);
-   INIT_LIST_HEAD(>msi_page_list);
+   cookie->msi_iova = base;
domain->iova_cookie = cookie;
return 0;
 }
-EXPORT_SYMBOL(iommu_get_dma_cookie);
+EXPORT_SYMBOL(iommu_get_msi_cookie);
 
 /**
  * iommu_put_dma_cookie - Release a domain's DMA mapping resources
- * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() or
+ *  iommu_get_msi_cookie()
  *
  * IOMMU drivers should normally call this from their domain_free callback.
  */
@@ -92,7 +142,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
if (!cookie)
return;
 
-   if (cookie->iovad.granule)
+   if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
put_iova_domain(>iovad);
 
list_for_each_entry_safe(msi, tmp, >msi_page_list, list) {
@@ -137,11 +187,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
 int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
u64 size, struct 

[RFC v3 09/10] iommu/arm-smmu: Implement reserved region get/put callbacks

2016-11-15 Thread Eric Auger
The get() populates the list with the PCI host bridge windows
and the MSI IOVA range.

At the moment an arbitray MSI IOVA window is set at 0x800
of size 1MB. This will allow to report those info in iommu-group
sysfs?

Signed-off-by: Eric Auger 

---

RFC v2 -> v3:
- use existing get/put_resv_regions

RFC v1 -> v2:
- use defines for MSI IOVA base and length
---
 drivers/iommu/arm-smmu.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 8f72814..81f1a83 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
 
 #define FSYNR0_WNR (1 << 4)
 
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
 static int force_stage;
 module_param(force_stage, int, S_IRUGO);
 MODULE_PARM_DESC(force_stage,
@@ -1545,6 +1548,53 @@ static int arm_smmu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, , 1);
 }
 
+static void arm_smmu_get_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+   struct iommu_resv_region *region;
+   struct pci_host_bridge *bridge;
+   struct resource_entry *window;
+
+   /* MSI region */
+   region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
+IOMMU_RESV_MSI);
+   if (!region)
+   return;
+
+   list_add_tail(>list, head);
+
+   if (!dev_is_pci(dev))
+   return;
+
+   bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
+
+   resource_list_for_each_entry(window, >windows) {
+   phys_addr_t start;
+   size_t length;
+
+   if (resource_type(window->res) != IORESOURCE_MEM &&
+   resource_type(window->res) != IORESOURCE_IO)
+   continue;
+
+   start = window->res->start - window->offset;
+   length = window->res->end - window->res->start + 1;
+   region = iommu_alloc_resv_region(start, length,
+IOMMU_RESV_NOMAP);
+   if (!region)
+   return;
+   list_add_tail(>list, head);
+   }
+}
+
+static void arm_smmu_put_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+   struct iommu_resv_region *entry, *next;
+
+   list_for_each_entry_safe(entry, next, head, list)
+   kfree(entry);
+}
+
 static struct iommu_ops arm_smmu_ops = {
.capable= arm_smmu_capable,
.domain_alloc   = arm_smmu_domain_alloc,
@@ -1560,6 +1610,8 @@ static int arm_smmu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
.of_xlate   = arm_smmu_of_xlate,
+   .get_resv_regions   = arm_smmu_get_resv_regions,
+   .put_resv_regions   = arm_smmu_put_resv_regions,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
 };
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v3 08/10] iommu/vt-d: Implement reserved region get/put callbacks

2016-11-15 Thread Eric Auger
This patch registers the [FEE0_h - FEF0_000h] 1MB MSI range
as a reserved region. This will allow to report that range
in the iommu-group sysfs.

Signed-off-by: Eric Auger 

---

RFCv2 -> RFCv3:
- use get/put_resv_region callbacks.

RFC v1 -> RFC v2:
- fix intel_iommu_add_reserved_regions name
- use IOAPIC_RANGE_START and IOAPIC_RANGE_END defines
- return if the MSI region is already registered;
---
 drivers/iommu/intel-iommu.c | 50 +
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3965e73..8dada8f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5183,6 +5183,28 @@ static void intel_iommu_remove_device(struct device *dev)
iommu_device_unlink(iommu->iommu_dev, dev);
 }
 
+static void intel_iommu_get_resv_regions(struct device *device,
+struct list_head *head)
+{
+   struct iommu_resv_region *reg;
+
+   reg = iommu_alloc_resv_region(IOAPIC_RANGE_START,
+ IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1,
+ IOMMU_RESV_NOMAP);
+   if (!reg)
+   return;
+   list_add_tail(>list, head);
+}
+
+static void intel_iommu_put_resv_regions(struct device *dev,
+struct list_head *head)
+{
+   struct iommu_resv_region *entry, *next;
+
+   list_for_each_entry_safe(entry, next, head, list)
+   kfree(entry);
+}
+
 #ifdef CONFIG_INTEL_IOMMU_SVM
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev 
*sdev)
 {
@@ -5292,19 +5314,21 @@ struct intel_iommu *intel_svm_device_to_iommu(struct 
device *dev)
 #endif /* CONFIG_INTEL_IOMMU_SVM */
 
 static const struct iommu_ops intel_iommu_ops = {
-   .capable= intel_iommu_capable,
-   .domain_alloc   = intel_iommu_domain_alloc,
-   .domain_free= intel_iommu_domain_free,
-   .attach_dev = intel_iommu_attach_device,
-   .detach_dev = intel_iommu_detach_device,
-   .map= intel_iommu_map,
-   .unmap  = intel_iommu_unmap,
-   .map_sg = default_iommu_map_sg,
-   .iova_to_phys   = intel_iommu_iova_to_phys,
-   .add_device = intel_iommu_add_device,
-   .remove_device  = intel_iommu_remove_device,
-   .device_group   = pci_device_group,
-   .pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
+   .capable= intel_iommu_capable,
+   .domain_alloc   = intel_iommu_domain_alloc,
+   .domain_free= intel_iommu_domain_free,
+   .attach_dev = intel_iommu_attach_device,
+   .detach_dev = intel_iommu_detach_device,
+   .map= intel_iommu_map,
+   .unmap  = intel_iommu_unmap,
+   .map_sg = default_iommu_map_sg,
+   .iova_to_phys   = intel_iommu_iova_to_phys,
+   .add_device = intel_iommu_add_device,
+   .remove_device  = intel_iommu_remove_device,
+   .get_resv_regions   = intel_iommu_get_resv_regions,
+   .put_resv_regions   = intel_iommu_put_resv_regions,
+   .device_group   = pci_device_group,
+   .pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
 };
 
 static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v3 02/10] iommu: Rename iommu_dm_regions into iommu_resv_regions

2016-11-15 Thread Eric Auger
We want to extend the callbacks used for dm regions and
use them for reserved regions. Reserved regions can be
- directly mapped regions
- regions that cannot be iommu mapped (PCI host bridge windows, ...)
- MSI regions (because they belong to another address space or because
  they are not translated by the IOMMU and need special handling)

So let's rename the struct and also the callbacks.

Signed-off-by: Eric Auger 
---
 drivers/iommu/amd_iommu.c | 20 ++--
 drivers/iommu/iommu.c | 22 +++---
 include/linux/iommu.h | 29 +++--
 3 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 754595e..a6c351d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3159,8 +3159,8 @@ static bool amd_iommu_capable(enum iommu_cap cap)
return false;
 }
 
-static void amd_iommu_get_dm_regions(struct device *dev,
-struct list_head *head)
+static void amd_iommu_get_resv_regions(struct device *dev,
+  struct list_head *head)
 {
struct unity_map_entry *entry;
int devid;
@@ -3170,7 +3170,7 @@ static void amd_iommu_get_dm_regions(struct device *dev,
return;
 
list_for_each_entry(entry, _iommu_unity_map, list) {
-   struct iommu_dm_region *region;
+   struct iommu_resv_region *region;
 
if (devid < entry->devid_start || devid > entry->devid_end)
continue;
@@ -3193,18 +3193,18 @@ static void amd_iommu_get_dm_regions(struct device *dev,
}
 }
 
-static void amd_iommu_put_dm_regions(struct device *dev,
+static void amd_iommu_put_resv_regions(struct device *dev,
 struct list_head *head)
 {
-   struct iommu_dm_region *entry, *next;
+   struct iommu_resv_region *entry, *next;
 
list_for_each_entry_safe(entry, next, head, list)
kfree(entry);
 }
 
-static void amd_iommu_apply_dm_region(struct device *dev,
+static void amd_iommu_apply_resv_region(struct device *dev,
  struct iommu_domain *domain,
- struct iommu_dm_region *region)
+ struct iommu_resv_region *region)
 {
struct dma_ops_domain *dma_dom = to_dma_ops_domain(to_pdomain(domain));
unsigned long start, end;
@@ -3228,9 +3228,9 @@ static void amd_iommu_apply_dm_region(struct device *dev,
.add_device = amd_iommu_add_device,
.remove_device = amd_iommu_remove_device,
.device_group = amd_iommu_device_group,
-   .get_dm_regions = amd_iommu_get_dm_regions,
-   .put_dm_regions = amd_iommu_put_dm_regions,
-   .apply_dm_region = amd_iommu_apply_dm_region,
+   .get_resv_regions = amd_iommu_get_resv_regions,
+   .put_resv_regions = amd_iommu_put_resv_regions,
+   .apply_resv_region = amd_iommu_apply_resv_region,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
 };
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..c7ed334 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -318,7 +318,7 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
  struct device *dev)
 {
struct iommu_domain *domain = group->default_domain;
-   struct iommu_dm_region *entry;
+   struct iommu_resv_region *entry;
struct list_head mappings;
unsigned long pg_size;
int ret = 0;
@@ -331,14 +331,14 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
pg_size = 1UL << __ffs(domain->pgsize_bitmap);
INIT_LIST_HEAD();
 
-   iommu_get_dm_regions(dev, );
+   iommu_get_resv_regions(dev, );
 
/* We need to consider overlapping regions for different devices */
list_for_each_entry(entry, , list) {
dma_addr_t start, end, addr;
 
-   if (domain->ops->apply_dm_region)
-   domain->ops->apply_dm_region(dev, domain, entry);
+   if (domain->ops->apply_resv_region)
+   domain->ops->apply_resv_region(dev, domain, entry);
 
start = ALIGN(entry->start, pg_size);
end   = ALIGN(entry->start + entry->length, pg_size);
@@ -358,7 +358,7 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
}
 
 out:
-   iommu_put_dm_regions(dev, );
+   iommu_put_resv_regions(dev, );
 
return ret;
 }
@@ -1546,20 +1546,20 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
 
-void iommu_get_dm_regions(struct device *dev, struct list_head *list)
+void iommu_get_resv_regions(struct device *dev, struct list_head *list)
 {
const struct iommu_ops *ops = 

[RFC v3 07/10] iommu: Implement reserved_regions iommu-group sysfs file

2016-11-15 Thread Eric Auger
A new iommu-group sysfs attribute file is introduced. It contains
the list of reserved regions for the iommu-group. Each reserved
region is described on a separate line:
- first field is the start IOVA address,
- second is the end IOVA address,

Signed-off-by: Eric Auger 

---

The file layout is inspired of /sys/bus/pci/devices/BDF/resource.
I also read Documentation/filesystems/sysfs.txt so I expect this
to be frowned upon.
---
 drivers/iommu/iommu.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e0fbcc5..ce9a465 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct kset *iommu_group_kset;
@@ -186,8 +187,47 @@ int iommu_get_group_resv_regions(struct iommu_group *group,
 }
 EXPORT_SYMBOL_GPL(iommu_get_group_resv_regions);
 
+static int iommu_resv_region_cmp(void *priv, struct list_head *a,
+  struct list_head *b)
+{
+   struct iommu_resv_region *rega, *regb;
+
+   rega = container_of(a, struct iommu_resv_region, list);
+   regb = container_of(b, struct iommu_resv_region, list);
+
+   if (rega->start + rega->length - 1 < regb->start)
+   return -1;
+   if (regb->start + regb->length - 1 < rega->start)
+   return 1;
+   else
+   return 0;
+}
+
+static ssize_t iommu_group_show_resv_regions(struct iommu_group *group,
+char *buf)
+{
+   struct iommu_resv_region *region, *next;
+   struct list_head group_resv_regions;
+   char *str = buf;
+
+   INIT_LIST_HEAD(_resv_regions);
+   iommu_get_group_resv_regions(group, _resv_regions);
+   list_sort(NULL, _resv_regions, iommu_resv_region_cmp);
+
+   list_for_each_entry_safe(region, next, _resv_regions, list) {
+   str += sprintf(str, "0x%016llx 0x%016llx\n", region->start,
+  region->start + region->length - 1);
+   kfree(region);
+   }
+
+   return (str - buf);
+}
+
 static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);
 
+static IOMMU_GROUP_ATTR(reserved_regions, S_IRUGO,
+   iommu_group_show_resv_regions, NULL);
+
 static void iommu_group_release(struct kobject *kobj)
 {
struct iommu_group *group = to_iommu_group(kobj);
@@ -202,6 +242,8 @@ static void iommu_group_release(struct kobject *kobj)
if (group->default_domain)
iommu_domain_free(group->default_domain);
 
+   iommu_group_remove_file(group, _group_attr_reserved_regions);
+
kfree(group->name);
kfree(group);
 }
@@ -265,6 +307,11 @@ struct iommu_group *iommu_group_alloc(void)
 */
kobject_put(>kobj);
 
+   ret = iommu_group_create_file(group,
+ _group_attr_reserved_regions);
+   if (ret)
+   return ERR_PTR(ret);
+
pr_debug("Allocated group %d\n", group->id);
 
return group;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v3 04/10] iommu: iommu_alloc_resv_region

2016-11-15 Thread Eric Auger
Introduce a new helper serving the purpose to allocate a reserved
region.  This will be used in iommu driver implementing reserved
region callbacks.

Signed-off-by: Eric Auger 
---
 drivers/iommu/iommu.c | 16 
 include/linux/iommu.h |  8 
 2 files changed, 24 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c7ed334..6ee529f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1562,6 +1562,22 @@ void iommu_put_resv_regions(struct device *dev, struct 
list_head *list)
ops->put_resv_regions(dev, list);
 }
 
+struct iommu_resv_region *iommu_alloc_resv_region(phys_addr_t start,
+ size_t length,
+ unsigned int prot)
+{
+   struct iommu_resv_region *region;
+
+   region = kzalloc(sizeof(*region), GFP_KERNEL);
+   if (!region)
+   return NULL;
+
+   region->start = start;
+   region->length = length;
+   region->prot = prot;
+   return region;
+}
+
 /* Request that a device is direct mapped by the IOMMU */
 int iommu_request_dm_for_dev(struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 02cf565..0aea877 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -241,6 +241,8 @@ extern void iommu_set_fault_handler(struct iommu_domain 
*domain,
 extern void iommu_get_resv_regions(struct device *dev, struct list_head *list);
 extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
 extern int iommu_request_dm_for_dev(struct device *dev);
+extern struct iommu_resv_region *
+iommu_alloc_resv_region(phys_addr_t start, size_t length, unsigned int prot);
 
 extern int iommu_attach_group(struct iommu_domain *domain,
  struct iommu_group *group);
@@ -454,6 +456,12 @@ static inline void iommu_put_resv_regions(struct device 
*dev,
 {
 }
 
+static inline struct iommu_resv_region *
+iommu_alloc_resv_region(phys_addr_t start, size_t length, unsigned int prot)
+{
+   return NULL;
+}
+
 static inline int iommu_request_dm_for_dev(struct device *dev)
 {
return -ENODEV;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v3 06/10] iommu: iommu_get_group_resv_regions

2016-11-15 Thread Eric Auger
Introduce iommu_get_group_resv_regions whose role consists in
enumerating all devices from the group and collecting their
reserved regions. It checks duplicates.

Signed-off-by: Eric Auger 

---

- we do not move list elements from device to group list since
  the iommu_put_resv_regions() could not be called.
- at the moment I did not introduce any iommu_put_group_resv_regions
  since it simply consists in voiding/freeing the list
---
 drivers/iommu/iommu.c | 53 +++
 include/linux/iommu.h |  8 
 2 files changed, 61 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a4530ad..e0fbcc5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -133,6 +133,59 @@ static ssize_t iommu_group_show_name(struct iommu_group 
*group, char *buf)
return sprintf(buf, "%s\n", group->name);
 }
 
+static bool iommu_resv_region_present(struct iommu_resv_region *region,
+ struct list_head *head)
+{
+   struct iommu_resv_region *entry;
+
+   list_for_each_entry(entry, head, list) {
+   if ((region->start == entry->start) &&
+   (region->length == entry->length) &&
+   (region->prot == entry->prot))
+   return true;
+   }
+   return false;
+}
+
+static int
+iommu_insert_device_resv_regions(struct list_head *dev_resv_regions,
+struct list_head *group_resv_regions)
+{
+   struct iommu_resv_region *entry, *region;
+
+   list_for_each_entry(entry, dev_resv_regions, list) {
+   if (iommu_resv_region_present(entry, group_resv_regions))
+   continue;
+   region = iommu_alloc_resv_region(entry->start, entry->length,
+  entry->prot);
+   if (!region)
+   return -ENOMEM;
+
+   list_add_tail(>list, group_resv_regions);
+   }
+   return 0;
+}
+
+int iommu_get_group_resv_regions(struct iommu_group *group,
+struct list_head *head)
+{
+   struct iommu_device *device;
+   int ret = 0;
+
+   list_for_each_entry(device, >devices, list) {
+   struct list_head dev_resv_regions;
+
+   INIT_LIST_HEAD(_resv_regions);
+   iommu_get_resv_regions(device->dev, _resv_regions);
+   ret = iommu_insert_device_resv_regions(_resv_regions, head);
+   iommu_put_resv_regions(device->dev, _resv_regions);
+   if (ret)
+   break;
+   }
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_get_group_resv_regions);
+
 static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);
 
 static void iommu_group_release(struct kobject *kobj)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0aea877..0f7ae2c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -243,6 +243,8 @@ extern void iommu_set_fault_handler(struct iommu_domain 
*domain,
 extern int iommu_request_dm_for_dev(struct device *dev);
 extern struct iommu_resv_region *
 iommu_alloc_resv_region(phys_addr_t start, size_t length, unsigned int prot);
+extern int iommu_get_group_resv_regions(struct iommu_group *group,
+   struct list_head *head);
 
 extern int iommu_attach_group(struct iommu_domain *domain,
  struct iommu_group *group);
@@ -462,6 +464,12 @@ static inline void iommu_put_resv_regions(struct device 
*dev,
return NULL;
 }
 
+static inline int iommu_get_group_resv_regions(struct iommu_group *group,
+  struct list_head *head)
+{
+   return -ENODEV;
+}
+
 static inline int iommu_request_dm_for_dev(struct device *dev)
 {
return -ENODEV;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v3 05/10] iommu: Do not map reserved regions

2016-11-15 Thread Eric Auger
As we introduced IOMMU_RESV_NOMAP and IOMMU_RESV_MSI regions,
let's prevent those new regions from being mapped.

Signed-off-by: Eric Auger 
---
 drivers/iommu/iommu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6ee529f..a4530ad 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -343,6 +343,9 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
start = ALIGN(entry->start, pg_size);
end   = ALIGN(entry->start + entry->length, pg_size);
 
+   if (entry->prot & IOMMU_RESV_MASK)
+   continue;
+
for (addr = start; addr < end; addr += pg_size) {
phys_addr_t phys_addr;
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] ACPI IORT ARM SMMU support

2016-11-15 Thread Rafael J. Wysocki
On Tue, Nov 15, 2016 at 11:12 AM, Lorenzo Pieralisi
 wrote:
> Hi Rafael,
>
> On Thu, Nov 10, 2016 at 12:36:12AM +0100, Rafael J. Wysocki wrote:
>> Hi Lorenzo,
>>
>> On Wed, Nov 9, 2016 at 3:19 PM, Lorenzo Pieralisi
>>  wrote:
>> > This patch series is v7 of a previous posting:
>> >
>> > https://lkml.org/lkml/2016/10/18/506
>>
>> I don't see anything objectionable in this series.
>>
>> Please let me know which patches in particular to look at in detail.
>
> Are patches 7 and consequently 16 (that builds on top of 7) ok with
> you ? They should be equivalent to nop on anything other than ARM64
> but they touch ACPI core so they require an ACK if they are ok please.

Yes, they are.  Please feel free to add my ACKs to those or I will
send them separately later today.

Thanks,
Rafael
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] arm64: SMMU-v2: Workaround for Cavium ThunderX erratum 28168

2016-11-15 Thread Robin Murphy
On 15/11/16 09:26, Marc Zyngier wrote:
> On 15/11/16 07:00, Geetha sowjanya wrote:
>> From: Tirumalesh Chalamarla 
>>
>>   This patch implements Cavium ThunderX erratum 28168.
>>
>>   PCI requires stores complete in order. Due to erratum #28168
>>   PCI-inbound MSI-X store to the interrupt controller are delivered
>>   to the interrupt controller before older PCI-inbound memory stores
>>   are committed.
>>   Doing a sync on SMMU will make sure all prior data transfers are
>>   completed before invoking ISR.
>>
>> Signed-off-by: Tirumalesh Chalamarla 
>> Signed-off-by: Geetha sowjanya 
>> ---
>>  arch/arm64/Kconfig  |   11 +++
>>  arch/arm64/Kconfig.platforms|1 +
>>  arch/arm64/include/asm/cpufeature.h |3 ++-
>>  arch/arm64/kernel/cpu_errata.c  |   16 
>>  drivers/iommu/arm-smmu.c|   24 
>>  drivers/irqchip/irq-gic-common.h|1 +
>>  drivers/irqchip/irq-gic-v3.c|   19 +++
>>  7 files changed, 74 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 30398db..751972c 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -474,6 +474,17 @@ config CAVIUM_ERRATUM_27456
>>  
>>If unsure, say Y.
>>  
>> +config CAVIUM_ERRATUM_28168
>> +   bool "Cavium erratum 28168: Make sure DMA data transfer is done 
>> before MSIX"
>> +   depends on ARCH_THUNDER && ARM64
>> +   default y
>> +   help
>> +Due to erratum #28168 PCI-inbound MSI-X store to the interrupt
>> +controller are delivered to the interrupt controller before older
>> +PCI-inbound memory stores are committed. Doing a sync on SMMU
>> +will make sure all prior data transfers are done before invoking 
>> ISR.
>> +
>> +If unsure, say Y.
> 
> Where is the entry in Documentation/arm64/silicon-errata.txt?
> 
>>  endmenu
>>  
>>  
>> diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
>> index cfbdf02..2ac4ac6 100644
>> --- a/arch/arm64/Kconfig.platforms
>> +++ b/arch/arm64/Kconfig.platforms
>> @@ -185,6 +185,7 @@ config ARCH_SPRD
>>  
>>  config ARCH_THUNDER
>>  bool "Cavium Inc. Thunder SoC Family"
>> +select IRQ_PREFLOW_FASTEOI
>>  help
>>This enables support for Cavium's Thunder Family of SoCs.
>>  
>> diff --git a/arch/arm64/include/asm/cpufeature.h 
>> b/arch/arm64/include/asm/cpufeature.h
>> index 758d74f..821fc3c 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -40,8 +40,9 @@
>>  #define ARM64_HAS_32BIT_EL0 13
>>  #define ARM64_HYP_OFFSET_LOW14
>>  #define ARM64_MISMATCHED_CACHE_LINE_SIZE15
>> +#define ARM64_WORKAROUND_CAVIUM_28168   16
>>  
>> -#define ARM64_NCAPS 16
>> +#define ARM64_NCAPS 17
>>  
>>  #ifndef __ASSEMBLY__
>>  
>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
>> index 0150394..0841a12 100644
>> --- a/arch/arm64/kernel/cpu_errata.c
>> +++ b/arch/arm64/kernel/cpu_errata.c
>> @@ -122,6 +122,22 @@ static void cpu_enable_trap_ctr_access(void *__unused)
>>  MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x00),
>>  },
>>  #endif
>> +#ifdef CONFIG_CAVIUM_ERRATUM_28168
>> +{
>> +/* Cavium ThunderX, T88 pass 1.x - 2.1 */
>> +.desc = "Cavium erratum 28168",
>> +.capability = ARM64_WORKAROUND_CAVIUM_28168,
>> +MIDR_RANGE(MIDR_THUNDERX, 0x00,
>> +   (1 << MIDR_VARIANT_SHIFT) | 1),
>> +},
>> +{
>> +/* Cavium ThunderX, T81 pass 1.0 */
>> +.desc = "Cavium erratum 28168",
>> +.capability = ARM64_WORKAROUND_CAVIUM_28168,
>> +MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x00),
>> +},
>> +#endif
> 
> How is that a CPU bug? Shouldn't that be keyed on the SMMU version or
> the ITs version?

Seconded. I assume the actual component at fault is the PCI RC, SMMU, or
interconnect in between - are there no ID registers in those parts that
will indicate a fixed version (or ECO) in future?

>> +
>>  {
>>  .desc = "Mismatched cache line size",
>>  .capability = ARM64_MISMATCHED_CACHE_LINE_SIZE,
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c841eb7..1b4555c 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -570,6 +570,30 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device 
>> *smmu)
>>  }
>>  }
>>  
>> +/*
>> + * Cavium ThunderX erratum 28168
>> + *
>> + * Due to erratum #28168 PCI-inbound MSI-X store to the interrupt
>> + * controller are delivered to the interrupt controller before older
>> + * PCI-inbound memory stores are committed. Doing a sync on SMMU
>> + * will make sure all prior data 

Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Borislav Petkov
On Tue, Nov 15, 2016 at 01:10:35PM +0100, Joerg Roedel wrote:
> Maybe add a comment here why you can't use cpu_has (yet).

So that could be alleviated by moving this function *after*
init_scattered_cpuid_features(). Then you can simply do *cpu_has().

Also, I'm not sure why we're checking CPUID for the SME feature when we
have sme_get_me_mask() et al which have been setup much earlier...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 04/20] x86: Handle reduction in physical address size with SME

2016-11-15 Thread Joerg Roedel
On Wed, Nov 09, 2016 at 06:35:13PM -0600, Tom Lendacky wrote:
> +/*
> + * AMD Secure Memory Encryption (SME) can reduce the size of the physical
> + * address space if it is enabled, even if memory encryption is not active.
> + * Adjust x86_phys_bits if SME is enabled.
> + */
> +static void phys_bits_adjust(struct cpuinfo_x86 *c)
> +{

Better call this function amd_sme_phys_bits_adjust(). This name makes it
clear at the call-site why it is there and what it does.

> + u32 eax, ebx, ecx, edx;
> + u64 msr;
> +
> + if (c->x86_vendor != X86_VENDOR_AMD)
> + return;
> +
> + if (c->extended_cpuid_level < 0x801f)
> + return;
> +
> + /* Check for SME feature */
> + cpuid(0x801f, , , , );
> + if (!(eax & 0x01))
> + return;

Maybe add a comment here why you can't use cpu_has (yet).

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Stop getting dma_32bit_pfn wrong

2016-11-15 Thread Joerg Roedel
On Fri, Nov 11, 2016 at 06:30:45PM +, Robin Murphy wrote:
> iommu_dma_init_domain() was originally written under the misconception
> that dma_32bit_pfn represented some sort of size limit for IOVA domains.
> Since the truth is almost the exact opposite of that, rework the logic
> and comments to reflect its real purpose of optimising lookups when
> allocating from a subset of the available space.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/dma-iommu.c | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index c5ab8667e6f2..ae045a14b530 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -139,6 +139,7 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
> dma_addr_t base,
>  {
>   struct iova_domain *iovad = cookie_iovad(domain);
>   unsigned long order, base_pfn, end_pfn;
> + bool pci = dev && dev_is_pci(dev);
>  
>   if (!iovad)
>   return -ENODEV;
> @@ -161,19 +162,31 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
> dma_addr_t base,
>   end_pfn = min_t(unsigned long, end_pfn,
>   domain->geometry.aperture_end >> order);
>   }
> + /*
> +  * PCI devices may have larger DMA masks, but still prefer allocating
> +  * within a 32-bit mask to avoid DAC addressing. Such limitations don't
> +  * apply to the typical platform device, so for those we may as well
> +  * leave the cache limit at the top of the range they're likely to use.
> +  */
> + if (pci)
> + end_pfn = min_t(unsigned long, end_pfn,
> + DMA_BIT_MASK(32) >> order);

Question, does it actually hurt platform devices to follow the same
allocation strategy as pci devices? I mean, does it hurt enough to
special-case the code here?



Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug report] iommu: Implement common IOMMU ops for DMA mapping

2016-11-15 Thread Robin Murphy
Hi Dan,

On 15/11/16 09:44, Dan Carpenter wrote:
> Hello Robin Murphy,
> 
> The patch 0db2e5d18f76: "iommu: Implement common IOMMU ops for DMA
> mapping" from Oct 1, 2015, leads to the following static checker
> warning:
> 
>   drivers/iommu/dma-iommu.c:377 iommu_dma_alloc()
>   warn: use 'gfp' here instead of GFP_XXX?
> 
> drivers/iommu/dma-iommu.c
>343  struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t 
> gfp,
>
> ^
> 
>344  unsigned long attrs, int prot, dma_addr_t *handle,
>345  void (*flush_page)(struct device *, const void *, 
> phys_addr_t))
>346  {
>347  struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>348  struct iova_domain *iovad = cookie_iovad(domain);
>349  struct iova *iova;
>350  struct page **pages;
>351  struct sg_table sgt;
>352  dma_addr_t dma_addr;
>353  unsigned int count, min_size, alloc_sizes = 
> domain->pgsize_bitmap;
>354  
>355  *handle = DMA_ERROR_CODE;
>356  
>357  min_size = alloc_sizes & -alloc_sizes;
>358  if (min_size < PAGE_SIZE) {
>359  min_size = PAGE_SIZE;
>360  alloc_sizes |= PAGE_SIZE;
>361  } else {
>362  size = ALIGN(size, min_size);
>363  }
>364  if (attrs & DMA_ATTR_ALLOC_SINGLE_PAGES)
>365  alloc_sizes = min_size;
>366  
>367  count = PAGE_ALIGN(size) >> PAGE_SHIFT;
>368  pages = __iommu_dma_alloc_pages(count, alloc_sizes >> 
> PAGE_SHIFT, gfp);
>   
> 
> Here we use gfp.
> 
>369  if (!pages)
>370  return NULL;
>371  
>372  iova = __alloc_iova(domain, size, dev->coherent_dma_mask);
>373  if (!iova)
>374  goto out_free_pages;
>375  
>376  size = iova_align(iovad, size);
>377  if (sg_alloc_table_from_pages(, pages, count, 0, size, 
> GFP_KERNEL))
>
> ^^
> Here we're using GFP_KERNEL.  I don't know the code well enough to say
> if it was intentional.

Yes, it's intentional - the SG table is a separate (transient) thing to
help map the buffer itself, and iommu_dma_alloc() should only be called
from blocking-safe contexts anyway (since __iommu_dma_alloc_pages() may
need to vmalloc() a large array to keep track of the pages if the buffer
being allocated is massive - the GFP_KERNEL in the other case there is
deliberate, too).

Thanks,
Robin.

> 
>378  goto out_free_iova;
>379  
>380  if (!(prot & IOMMU_CACHE)) {
>381  struct sg_mapping_iter miter;
>382  /*
>383   * The CPU-centric flushing implied by SG_MITER_TO_SG 
> isn't
>384   * sufficient here, so skip it by using the "wrong" 
> direction.
>385   */
>386  sg_miter_start(, sgt.sgl, sgt.orig_nents, 
> SG_MITER_FROM_SG);
> 
> regards,
> dan carpenter
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND] iommu/iova: Extend cached node lookup condition

2016-11-15 Thread Joerg Roedel
On Fri, Nov 11, 2016 at 06:35:46PM +, Robin Murphy wrote:
> When searching for a free IOVA range, we optimise the tree traversal
> by starting from the cached32_node, instead of the last node, when
> limit_pfn is equal to dma_32bit_pfn. However, if limit_pfn happens to
> be smaller, then we'll go ahead and start from the top even though
> dma_32bit_pfn is still a more suitable upper bound. Since this is
> clearly a silly thing to do, adjust the lookup condition appropriately.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/iova.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/5] iommu: Allow taking a reference on a group directly

2016-11-15 Thread Joerg Roedel
On Fri, Nov 11, 2016 at 05:59:21PM +, Robin Murphy wrote:
> iommu_group_get_for_dev() expects that the IOMMU driver's device_group
> callback return a group with a reference held for the given device.
> Whilst allocating a new group is fine, and pci_device_group() correctly
> handles reusing an existing group, there is no general means for IOMMU
> drivers doing their own group lookup to take additional references on an
> existing group pointer without having to also store device pointers or
> resort to elaborate trickery.
> 
> Add an IOMMU-driver-specific function to fill the hole.
> 
> Acked-by: Sricharan R 
> Signed-off-by: Robin Murphy 
> ---
> 
> v2: Fix the function name; clarify what exactly its callers are fixing.
> 
>  drivers/iommu/iommu.c | 13 +
>  include/linux/iommu.h |  1 +
>  2 files changed, 14 insertions(+)

Applied the series, thanks Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: fix iommu_free() comparison of unsigned expression >= 0

2016-11-15 Thread Nikola Pajkovsky
commit 8fd524b355da ("x86: Kill bad_dma_address variable") has killed
bad_dma_address variable and used instead of macro DMA_ERROR_CODE
which is always zero. Since dma_addr is unsigned, statement

   dma_addr >= DMA_ERROR_CODE

is always true, and not needed.

arch/x86/kernel/pci-calgary_64.c: In function ‘iommu_free’:
arch/x86/kernel/pci-calgary_64.c:299:2: warning: comparison of unsigned 
expression >= 0 is always true [-Wtype-limits]
  if (unlikely((dma_addr >= DMA_ERROR_CODE) && (dma_addr < badend))) {

Fixes: 8fd524b355da ("x86: Kill bad_dma_address variable")
Signed-off-by: Nikola Pajkovsky 
---
 arch/x86/kernel/pci-calgary_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index 5d400ba1349d..d47517941bbc 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -296,7 +296,7 @@ static void iommu_free(struct iommu_table *tbl, dma_addr_t 
dma_addr,
 
/* were we called with bad_dma_address? */
badend = DMA_ERROR_CODE + (EMERGENCY_PAGES * PAGE_SIZE);
-   if (unlikely((dma_addr >= DMA_ERROR_CODE) && (dma_addr < badend))) {
+   if (unlikely(dma_addr < badend)) {
WARN(1, KERN_ERR "Calgary: driver tried unmapping bad DMA "
   "address 0x%Lx\n", dma_addr);
return;
-- 
2.10.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 00/16] ACPI IORT ARM SMMU support

2016-11-15 Thread Lorenzo Pieralisi
Hi Rafael,

On Thu, Nov 10, 2016 at 12:36:12AM +0100, Rafael J. Wysocki wrote:
> Hi Lorenzo,
> 
> On Wed, Nov 9, 2016 at 3:19 PM, Lorenzo Pieralisi
>  wrote:
> > This patch series is v7 of a previous posting:
> >
> > https://lkml.org/lkml/2016/10/18/506
> 
> I don't see anything objectionable in this series.
> 
> Please let me know which patches in particular to look at in detail.

Are patches 7 and consequently 16 (that builds on top of 7) ok with
you ? They should be equivalent to nop on anything other than ARM64
but they touch ACPI core so they require an ACK if they are ok please.

Thank you very much !
Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 04/16] drivers: iommu: make of_iommu_set/get_ops() DT agnostic

2016-11-15 Thread Lorenzo Pieralisi
On Mon, Nov 14, 2016 at 06:25:16PM +, Robin Murphy wrote:
> On 14/11/16 15:52, Joerg Roedel wrote:
> > On Mon, Nov 14, 2016 at 12:00:47PM +, Robin Murphy wrote:
> >> If we've already made the decision to move away from bus ops, I don't
> >> see that it makes sense to deliberately introduce new dependencies on
> >> them. Besides, as it stands, this patch literally implements "tell the
> >> iommu-core which hardware-iommus exist in the system and a seperate
> >> iommu_ops ptr for each of them" straight off.
> > 
> > Not sure which code you are looking at, but as I see it we have only
> > per-device iommu-ops now (with this patch). That is different from
> > having core-visible hardware-iommu instances where devices could link
> > to.
> 
> The per-device IOMMU ops are already there since 57f98d2f61e1. This
> patch generalises the other end, moving the "registering an IOMMU
> instance" (i.e. iommu_fwentry) bit into the IOMMU core, from being
> OF-specific. I'd be perfectly happy if we rename iommu_fwentry to
> iommu_instance, fwnode_iommu_set_ops() to iommu_register_instance(), and
> such if that makes the design intent clearer.

I second that and I need to know what to do with this patch sooner
rather than later so it is time we make a decision please.

Joerg, what's your opinion ?

Thanks,
Lorenzo

> If you'd also prefer to replace iommu_fwspec::ops with an opaque
> iommu_fwspec::iommu_instance pointer so that things are a bit more
> centralised (and users are forced to go through the API rather then call
> ops directly), I'd have no major objection either. My main point is that
> we've been deliberately putting the relevant building blocks in place -
> the of_iommu_{get,set}_ops stuff was designed from the start to
> accommodate per-instance ops, via the ops pointer *being* the instance
> token; the iommu_fwspec stuff is deliberately intended to provide
> per-device ops on top of that. The raw functionality is either there in
> iommu.c already, or moving there in patches already written, so if it
> doesn't look right all we need to focus on is making it look right.
> 
> > Also the rest of iommu-core code still makes use of the per-bus ops. The
> > per-device ops are only used for the of_xlate fn-ptr.
> 
> Hence my aforementioned patches intended for 4.10, directly following on
> from introducing iommu_fwspec in 4.9:
> 
> http://www.mail-archive.com/iommu@lists.linux-foundation.org/msg14576.html
> 
> ...the purpose being to provide a smooth transition from per-bus ops to
> per-device, per-instance ops. Apply those and we're 90% of the way there
> for OF-based IOMMU drivers (not that any of those actually need
> per-instance ops, admittedly; I did prototype it for the ARM SMMU ages
> ago, but it didn't seem worth the bother). Lorenzo's series broadens the
> scope to ACPI-based systems and moves the generically-useful parts into
> the core where we can easily build on them further if necessary. The
> major remaining work is to convert external callers of the current
> bus-dependent functions like iommu_domain_alloc(), iommu_present(), etc.
> to device-based alternatives.
> 
> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/exynos: Fix warnings from DMA-debug

2016-11-15 Thread Marek Szyprowski
Add a simple checks for dma_map_single() return value to make DMA-debug
checker happly. Exynos IOMMU on Samsung Exynos SoCs always use device,
which has linear DMA mapping ops (dma address is equal to physical memory
address), so no failures are returned from dma_map_single().

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/exynos-iommu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 30808e9..7b0bd86 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -763,6 +763,7 @@ static struct iommu_domain 
*exynos_iommu_domain_alloc(unsigned type)
DMA_TO_DEVICE);
/* For mapping page table entries we rely on dma == phys */
BUG_ON(handle != virt_to_phys(domain->pgtable));
+   BUG_ON(dma_mapping_error(dma_dev, handle));
 
spin_lock_init(>lock);
spin_lock_init(>pgtablelock);
@@ -909,6 +910,7 @@ static sysmmu_pte_t *alloc_lv2entry(struct 
exynos_iommu_domain *domain,
}
 
if (lv1ent_fault(sent)) {
+   dma_addr_t handle;
sysmmu_pte_t *pent;
bool need_flush_flpd_cache = lv1ent_zero(sent);
 
@@ -920,7 +922,9 @@ static sysmmu_pte_t *alloc_lv2entry(struct 
exynos_iommu_domain *domain,
update_pte(sent, mk_lv1ent_page(virt_to_phys(pent)));
kmemleak_ignore(pent);
*pgcounter = NUM_LV2ENTRIES;
-   dma_map_single(dma_dev, pent, LV2TABLE_SIZE, DMA_TO_DEVICE);
+   handle = dma_map_single(dma_dev, pent, LV2TABLE_SIZE,
+   DMA_TO_DEVICE);
+   BUG_ON(dma_mapping_error(dma_dev, handle));
 
/*
 * If pre-fetched SLPD is a faulty SLPD in zero_l2_table,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[bug report] iommu: Implement common IOMMU ops for DMA mapping

2016-11-15 Thread Dan Carpenter
Hello Robin Murphy,

The patch 0db2e5d18f76: "iommu: Implement common IOMMU ops for DMA
mapping" from Oct 1, 2015, leads to the following static checker
warning:

drivers/iommu/dma-iommu.c:377 iommu_dma_alloc()
warn: use 'gfp' here instead of GFP_XXX?

drivers/iommu/dma-iommu.c
   343  struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t 
gfp,
   ^

   344  unsigned long attrs, int prot, dma_addr_t *handle,
   345  void (*flush_page)(struct device *, const void *, 
phys_addr_t))
   346  {
   347  struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
   348  struct iova_domain *iovad = cookie_iovad(domain);
   349  struct iova *iova;
   350  struct page **pages;
   351  struct sg_table sgt;
   352  dma_addr_t dma_addr;
   353  unsigned int count, min_size, alloc_sizes = 
domain->pgsize_bitmap;
   354  
   355  *handle = DMA_ERROR_CODE;
   356  
   357  min_size = alloc_sizes & -alloc_sizes;
   358  if (min_size < PAGE_SIZE) {
   359  min_size = PAGE_SIZE;
   360  alloc_sizes |= PAGE_SIZE;
   361  } else {
   362  size = ALIGN(size, min_size);
   363  }
   364  if (attrs & DMA_ATTR_ALLOC_SINGLE_PAGES)
   365  alloc_sizes = min_size;
   366  
   367  count = PAGE_ALIGN(size) >> PAGE_SHIFT;
   368  pages = __iommu_dma_alloc_pages(count, alloc_sizes >> 
PAGE_SHIFT, gfp);

  
Here we use gfp.

   369  if (!pages)
   370  return NULL;
   371  
   372  iova = __alloc_iova(domain, size, dev->coherent_dma_mask);
   373  if (!iova)
   374  goto out_free_pages;
   375  
   376  size = iova_align(iovad, size);
   377  if (sg_alloc_table_from_pages(, pages, count, 0, size, 
GFP_KERNEL))
   
^^
Here we're using GFP_KERNEL.  I don't know the code well enough to say
if it was intentional.

   378  goto out_free_iova;
   379  
   380  if (!(prot & IOMMU_CACHE)) {
   381  struct sg_mapping_iter miter;
   382  /*
   383   * The CPU-centric flushing implied by SG_MITER_TO_SG 
isn't
   384   * sufficient here, so skip it by using the "wrong" 
direction.
   385   */
   386  sg_miter_start(, sgt.sgl, sgt.orig_nents, 
SG_MITER_FROM_SG);

regards,
dan carpenter
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] arm64: SMMU-v2: Workaround for Cavium ThunderX erratum 28168

2016-11-15 Thread Marc Zyngier
On 15/11/16 07:00, Geetha sowjanya wrote:
> From: Tirumalesh Chalamarla 
> 
>   This patch implements Cavium ThunderX erratum 28168.
> 
>   PCI requires stores complete in order. Due to erratum #28168
>   PCI-inbound MSI-X store to the interrupt controller are delivered
>   to the interrupt controller before older PCI-inbound memory stores
>   are committed.
>   Doing a sync on SMMU will make sure all prior data transfers are
>   completed before invoking ISR.
> 
> Signed-off-by: Tirumalesh Chalamarla 
> Signed-off-by: Geetha sowjanya 
> ---
>  arch/arm64/Kconfig  |   11 +++
>  arch/arm64/Kconfig.platforms|1 +
>  arch/arm64/include/asm/cpufeature.h |3 ++-
>  arch/arm64/kernel/cpu_errata.c  |   16 
>  drivers/iommu/arm-smmu.c|   24 
>  drivers/irqchip/irq-gic-common.h|1 +
>  drivers/irqchip/irq-gic-v3.c|   19 +++
>  7 files changed, 74 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 30398db..751972c 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -474,6 +474,17 @@ config CAVIUM_ERRATUM_27456
>  
> If unsure, say Y.
>  
> +config CAVIUM_ERRATUM_28168
> +   bool "Cavium erratum 28168: Make sure DMA data transfer is done 
> before MSIX"
> +   depends on ARCH_THUNDER && ARM64
> +   default y
> +   help
> +Due to erratum #28168 PCI-inbound MSI-X store to the interrupt
> +controller are delivered to the interrupt controller before older
> +PCI-inbound memory stores are committed. Doing a sync on SMMU
> +will make sure all prior data transfers are done before invoking ISR.
> +
> +If unsure, say Y.

Where is the entry in Documentation/arm64/silicon-errata.txt?

>  endmenu
>  
>  
> diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
> index cfbdf02..2ac4ac6 100644
> --- a/arch/arm64/Kconfig.platforms
> +++ b/arch/arm64/Kconfig.platforms
> @@ -185,6 +185,7 @@ config ARCH_SPRD
>  
>  config ARCH_THUNDER
>   bool "Cavium Inc. Thunder SoC Family"
> + select IRQ_PREFLOW_FASTEOI
>   help
> This enables support for Cavium's Thunder Family of SoCs.
>  
> diff --git a/arch/arm64/include/asm/cpufeature.h 
> b/arch/arm64/include/asm/cpufeature.h
> index 758d74f..821fc3c 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -40,8 +40,9 @@
>  #define ARM64_HAS_32BIT_EL0  13
>  #define ARM64_HYP_OFFSET_LOW 14
>  #define ARM64_MISMATCHED_CACHE_LINE_SIZE 15
> +#define ARM64_WORKAROUND_CAVIUM_2816816
>  
> -#define ARM64_NCAPS  16
> +#define ARM64_NCAPS  17
>  
>  #ifndef __ASSEMBLY__
>  
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 0150394..0841a12 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -122,6 +122,22 @@ static void cpu_enable_trap_ctr_access(void *__unused)
>   MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x00),
>   },
>  #endif
> +#ifdef CONFIG_CAVIUM_ERRATUM_28168
> + {
> + /* Cavium ThunderX, T88 pass 1.x - 2.1 */
> + .desc = "Cavium erratum 28168",
> + .capability = ARM64_WORKAROUND_CAVIUM_28168,
> + MIDR_RANGE(MIDR_THUNDERX, 0x00,
> +(1 << MIDR_VARIANT_SHIFT) | 1),
> + },
> + {
> + /* Cavium ThunderX, T81 pass 1.0 */
> + .desc = "Cavium erratum 28168",
> + .capability = ARM64_WORKAROUND_CAVIUM_28168,
> + MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x00),
> + },
> +#endif

How is that a CPU bug? Shouldn't that be keyed on the SMMU version or
the ITs version?

> +
>   {
>   .desc = "Mismatched cache line size",
>   .capability = ARM64_MISMATCHED_CACHE_LINE_SIZE,
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index c841eb7..1b4555c 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -570,6 +570,30 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device 
> *smmu)
>   }
>  }
>  
> +/*
> + * Cavium ThunderX erratum 28168
> + *
> + * Due to erratum #28168 PCI-inbound MSI-X store to the interrupt
> + * controller are delivered to the interrupt controller before older
> + * PCI-inbound memory stores are committed. Doing a sync on SMMU
> + * will make sure all prior data transfers are completed before
> + * invoking ISR.
> + *
> + */
> +void cavium_arm_smmu_tlb_sync(struct device *dev)
> +{
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> + struct arm_smmu_device *smmu;
> +
> + smmu = fwspec_smmu(fwspec);
> + if (!smmu)
> + return;
> + __arm_smmu_tlb_sync(smmu);
> +
> +}
>