RE: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device

2017-05-19 Thread Tian, Kevin
> From: Liu, Yi L [mailto:yi.l@linux.intel.com]
> Sent: Friday, May 19, 2017 1:24 PM
> 
> Hi Alex,
> 
> What's your opinion with Tianyu's question? Is it accepatable
> to use VFIO API in intel_iommu emulator?

Did you actually need such translation at all? SID should be
filled by kernel IOMMU driver based on which device is
requested with invalidation request, regardless of which 
guest SID is used in user space. Qemu only needs to know
which fd corresponds to guest SID, and then initiates an
invalidation request on that fd?

> 
> Thanks,
> Yi L
> On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> > On 2017年04月26日 18:06, Liu, Yi L wrote:
> > > With vIOMMU exposed to guest, vIOMMU emulator needs to do
> translation
> > > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > > emulator needs to replace guest SID with host SID so that to limit
> > > the invalidation. This patch introduces a new callback
> > > iommu_ops->record_device() to notify vIOMMU emulator to record
> necessary
> > > information about the assigned device.
> >
> > This patch is to prepare to translate guest sbdf to host sbdf.
> >
> > Alex:
> > Could we add a new vfio API to do such translation? This will be more
> > straight forward than storing host sbdf in the vIOMMU device model.
> >
> > >
> > > Signed-off-by: Liu, Yi L 
> > > ---
> > >  include/exec/memory.h | 11 +++
> > >  memory.c  | 12 
> > >  2 files changed, 23 insertions(+)
> > >
> > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > index 7bd13ab..49087ef 100644
> > > --- a/include/exec/memory.h
> > > +++ b/include/exec/memory.h
> > > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> > >  IOMMUNotifierFlag new_flags);
> > >  /* Set this up to provide customized IOMMU replay function */
> > >  void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > > +void (*record_device)(MemoryRegion *iommu,
> > > +  void *device_info);
> > >  };
> > >
> > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > @@ -708,6 +710,15 @@ void
> memory_region_notify_iommu(MemoryRegion *mr,
> > >  void memory_region_notify_one(IOMMUNotifier *notifier,
> > >IOMMUTLBEntry *entry);
> > >
> > > +/*
> > > + * memory_region_notify_device_record: notify IOMMU to record
> assign
> > > + * device.
> > > + * @mr: the memory region to notify
> > > + * @ device_info: device information
> > > + */
> > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > +void *info);
> > > +
> > >  /**
> > >   * memory_region_register_iommu_notifier: register a notifier for
> changes to
> > >   * IOMMU translation entries.
> > > diff --git a/memory.c b/memory.c
> > > index 0728e62..45ef069 100644
> > > --- a/memory.c
> > > +++ b/memory.c
> > > @@ -1600,6 +1600,18 @@ static void
> memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> > >  mr->iommu_notify_flags = flags;
> > >  }
> > >
> > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > +void *info)
> > > +{
> > > +assert(memory_region_is_iommu(mr));
> > > +
> > > +if (mr->iommu_ops->record_device) {
> > > +mr->iommu_ops->record_device(mr, info);
> > > +}
> > > +
> > > +return;
> > > +}
> > > +
> > >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > IOMMUNotifier *n)
> > >  {
> > >
> >
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/amd: Ratelimit io-page-faults per device

2017-05-19 Thread Joerg Roedel
From: Joerg Roedel 

Misbehaving devices can cause an endless chain of
io-page-faults, flooding dmesg and making the system-log
unusable or even prevent the system from booting.

So ratelimit the error messages about io-page-faults in a
per-device basis.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 33 +
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 63cacf5..f574a0f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -138,6 +138,8 @@ struct iommu_dev_data {
 PPR completions */
u32 errata;   /* Bitmap for errata to apply */
bool use_vapic;   /* Enable device to use vapic mode */
+
+   struct ratelimit_state rs;/* Ratelimit IOPF messages */
 };
 
 /*
@@ -253,6 +255,8 @@ static struct iommu_dev_data *alloc_dev_data(u16 devid)
list_add_tail(&dev_data->dev_data_list, &dev_data_list);
spin_unlock_irqrestore(&dev_data_list_lock, flags);
 
+   ratelimit_default_init(&dev_data->rs);
+
return dev_data;
 }
 
@@ -551,6 +555,30 @@ static void dump_command(unsigned long phys_addr)
pr_err("AMD-Vi: CMD[%d]: %08x\n", i, cmd->data[i]);
 }
 
+static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
+   u64 address, int flags)
+{
+   struct iommu_dev_data *dev_data = NULL;
+   struct pci_dev *pdev;
+
+   pdev = pci_get_bus_and_slot(PCI_BUS_NUM(devid), devid & 0xff);
+   if (pdev)
+   dev_data = get_dev_data(&pdev->dev);
+
+   if (dev_data && __ratelimit(&dev_data->rs)) {
+   dev_err(&pdev->dev, "IO_PAGE_FAULT domain=0x%04x 
address=0x%016llx flags=0x%04x]\n",
+   domain_id, address, flags);
+   } else if (printk_ratelimit()) {
+   pr_err("IO_PAGE_FAULT device=%02x:%02x.%x "
+  "domain=0x%04x address=0x%016llx flags=0x%04x]\n",
+   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   domain_id, address, flags);
+   }
+
+   if (pdev)
+   pci_dev_put(pdev);
+}
+
 static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 {
int type, devid, domid, flags;
@@ -586,10 +614,7 @@ static void iommu_print_event(struct amd_iommu *iommu, 
void *__evt)
dump_dte_entry(devid);
break;
case EVENT_TYPE_IO_FAULT:
-   printk("IO_PAGE_FAULT device=%02x:%02x.%x "
-  "domain=0x%04x address=0x%016llx flags=0x%04x]\n",
-  PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-  domid, address, flags);
+   amd_iommu_report_page_fault(devid, domid, address, flags);
break;
case EVENT_TYPE_DEV_TAB_ERR:
printk("DEV_TAB_HARDWARE_ERROR device=%02x:%02x.%x "
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[git pull] IOMMU Fixes for Linux v4.12-rc1

2017-05-19 Thread Joerg Roedel
Hi Linus,

The following changes since commit 2ea659a9ef488125eb46da6eb571de5eae5c43f6:

  Linux 4.12-rc1 (2017-05-13 13:19:49 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fixes-v4.12-rc1

for you to fetch changes up to 745b6e74704782488dd875292bc49e24d23e81fd:

  iommu/mediatek: Include linux/dma-mapping.h (2017-05-17 14:51:54 +0200)


IOMMU Fixes for Linux v4.12-rc1

Including:

* Another compile-fix as a fallout of the recent header-file
  cleanup

* Add a missing IO/TLB flush to the Intel VT-d kdump code path

* A fix for ARM64 dma code to only access initialized
  iova_domain members


Arnd Bergmann (1):
  iommu/mediatek: Include linux/dma-mapping.h

KarimAllah Ahmed (1):
  iommu/vt-d: Flush the IOTLB to get rid of the initial kdump mappings

Robin Murphy (1):
  iommu/dma: Don't touch invalid iova_domain members

 drivers/iommu/dma-iommu.c| 13 -
 drivers/iommu/intel-iommu.c  |  5 -
 drivers/iommu/mtk_iommu_v1.c |  1 +
 3 files changed, 13 insertions(+), 6 deletions(-)

Please pull.

Thanks,

Joerg


signature.asc
Description: Digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device

2017-05-19 Thread Liu, Yi L
On Fri, May 19, 2017 at 09:07:49AM +, Tian, Kevin wrote:
> > From: Liu, Yi L [mailto:yi.l@linux.intel.com]
> > Sent: Friday, May 19, 2017 1:24 PM
> > 
> > Hi Alex,
> > 
> > What's your opinion with Tianyu's question? Is it accepatable
> > to use VFIO API in intel_iommu emulator?
> 
> Did you actually need such translation at all? SID should be
> filled by kernel IOMMU driver based on which device is
> requested with invalidation request, regardless of which 
> guest SID is used in user space. Qemu only needs to know
> which fd corresponds to guest SID, and then initiates an
> invalidation request on that fd?

Kevin,

It actually depends on the svm binding behavior we expect in host
IOMMU driver side. If we want to have the binding per-device, this
translation is needed in Qemu either in VFIO or intel_iommu emulator.
So that the host SID could be used as a device selector when looping
devices in a group.

If we can use VFIO API directly, we also may trigger the svm bind/qi
propagation straightforwardly instead of using notifier.

Thanks,
Yi L
 
> > 
> > Thanks,
> > Yi L
> > On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> > > On 2017年04月26日 18:06, Liu, Yi L wrote:
> > > > With vIOMMU exposed to guest, vIOMMU emulator needs to do
> > translation
> > > > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > > > emulator needs to replace guest SID with host SID so that to limit
> > > > the invalidation. This patch introduces a new callback
> > > > iommu_ops->record_device() to notify vIOMMU emulator to record
> > necessary
> > > > information about the assigned device.
> > >
> > > This patch is to prepare to translate guest sbdf to host sbdf.
> > >
> > > Alex:
> > >   Could we add a new vfio API to do such translation? This will be more
> > > straight forward than storing host sbdf in the vIOMMU device model.
> > >
> > > >
> > > > Signed-off-by: Liu, Yi L 
> > > > ---
> > > >  include/exec/memory.h | 11 +++
> > > >  memory.c  | 12 
> > > >  2 files changed, 23 insertions(+)
> > > >
> > > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > > index 7bd13ab..49087ef 100644
> > > > --- a/include/exec/memory.h
> > > > +++ b/include/exec/memory.h
> > > > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> > > >  IOMMUNotifierFlag new_flags);
> > > >  /* Set this up to provide customized IOMMU replay function */
> > > >  void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > > > +void (*record_device)(MemoryRegion *iommu,
> > > > +  void *device_info);
> > > >  };
> > > >
> > > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > > @@ -708,6 +710,15 @@ void
> > memory_region_notify_iommu(MemoryRegion *mr,
> > > >  void memory_region_notify_one(IOMMUNotifier *notifier,
> > > >IOMMUTLBEntry *entry);
> > > >
> > > > +/*
> > > > + * memory_region_notify_device_record: notify IOMMU to record
> > assign
> > > > + * device.
> > > > + * @mr: the memory region to notify
> > > > + * @ device_info: device information
> > > > + */
> > > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > > +void *info);
> > > > +
> > > >  /**
> > > >   * memory_region_register_iommu_notifier: register a notifier for
> > changes to
> > > >   * IOMMU translation entries.
> > > > diff --git a/memory.c b/memory.c
> > > > index 0728e62..45ef069 100644
> > > > --- a/memory.c
> > > > +++ b/memory.c
> > > > @@ -1600,6 +1600,18 @@ static void
> > memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> > > >  mr->iommu_notify_flags = flags;
> > > >  }
> > > >
> > > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > > +void *info)
> > > > +{
> > > > +assert(memory_region_is_iommu(mr));
> > > > +
> > > > +if (mr->iommu_ops->record_device) {
> > > > +mr->iommu_ops->record_device(mr, info);
> > > > +}
> > > > +
> > > > +return;
> > > > +}
> > > > +
> > > >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > > IOMMUNotifier *n)
> > > >  {
> > > >
> > >
> > >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/amd: flush IOTLB for specific domains only (v2)

2017-05-19 Thread arindam . nath
From: Arindam Nath 

Change History
--

v2: changes suggested by Joerg
- add flush flag to improve efficiency of flush operation

v1:
- The idea behind flush queues is to defer the IOTLB flushing
  for domains for which the mappings are no longer valid. We
  add such domains in queue_add(), and when the queue size
  reaches FLUSH_QUEUE_SIZE, we perform __queue_flush().

  Since we have already taken lock before __queue_flush()
  is called, we need to make sure the IOTLB flushing is
  performed as quickly as possible.

  In the current implementation, we perform IOTLB flushing
  for all domains irrespective of which ones were actually
  added in the flush queue initially. This can be quite
  expensive especially for domains for which unmapping is
  not required at this point of time.

  This patch makes use of domain information in
  'struct flush_queue_entry' to make sure we only flush
  IOTLBs for domains who need it, skipping others.

Suggested-by: Joerg Roedel 
Signed-off-by: Arindam Nath 
---
 drivers/iommu/amd_iommu.c   | 27 ---
 drivers/iommu/amd_iommu_types.h |  2 ++
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 63cacf5..1edeebec 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2227,15 +2227,26 @@ static struct iommu_group 
*amd_iommu_device_group(struct device *dev)
 
 static void __queue_flush(struct flush_queue *queue)
 {
-   struct protection_domain *domain;
-   unsigned long flags;
int idx;
 
-   /* First flush TLB of all known domains */
-   spin_lock_irqsave(&amd_iommu_pd_lock, flags);
-   list_for_each_entry(domain, &amd_iommu_pd_list, list)
-   domain_flush_tlb(domain);
-   spin_unlock_irqrestore(&amd_iommu_pd_lock, flags);
+   /* First flush TLB of all domains which were added to flush queue */
+   for (idx = 0; idx < queue->next; ++idx) {
+   struct flush_queue_entry *entry;
+
+   entry = queue->entries + idx;
+
+   /*
+* There might be cases where multiple IOVA entries for the
+* same domain are queued in the flush queue. To avoid
+* flushing the same domain again, we check whether the
+* flag is set or not. This improves the efficiency of
+* flush operation.
+*/
+   if (!entry->dma_dom->domain.already_flushed) {
+   entry->dma_dom->domain.already_flushed = true;
+   domain_flush_tlb(&entry->dma_dom->domain);
+   }
+   }
 
/* Wait until flushes have completed */
domain_flush_complete(NULL);
@@ -2289,6 +2300,8 @@ static void queue_add(struct dma_ops_domain *dma_dom,
pages = __roundup_pow_of_two(pages);
address >>= PAGE_SHIFT;
 
+   dma_dom->domain.already_flushed = false;
+
queue = get_cpu_ptr(&flush_queue);
spin_lock_irqsave(&queue->lock, flags);
 
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 4de8f41..4f5519d 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -454,6 +454,8 @@ struct protection_domain {
bool updated;   /* complete domain flush required */
unsigned dev_cnt;   /* devices assigned to this domain */
unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
+   bool already_flushed;   /* flag to avoid flushing the same domain again
+  in a single invocation of __queue_flush() */
 };
 
 /*
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-19 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 04:22:23PM -0500, Tom Lendacky wrote:
> Add support to check if SME has been enabled and if memory encryption
> should be activated (checking of command line option based on the
> configuration of the default state).  If memory encryption is to be
> activated, then the encryption mask is set and the kernel is encrypted
> "in place."
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/kernel/head_64.S |1 +
>  arch/x86/mm/mem_encrypt.c |   83 
> +++--
>  2 files changed, 80 insertions(+), 4 deletions(-)

...

> +unsigned long __init sme_enable(struct boot_params *bp)
>  {
> + const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
> + unsigned int eax, ebx, ecx, edx;
> + unsigned long me_mask;
> + bool active_by_default;
> + char buffer[16];
> + u64 msr;
> +
> + /* Check for the SME support leaf */
> + eax = 0x8000;
> + ecx = 0;
> + native_cpuid(&eax, &ebx, &ecx, &edx);
> + if (eax < 0x801f)
> + goto out;
> +
> + /*
> +  * Check for the SME feature:
> +  *   CPUID Fn8000_001F[EAX] - Bit 0
> +  * Secure Memory Encryption support
> +  *   CPUID Fn8000_001F[EBX] - Bits 5:0
> +  * Pagetable bit position used to indicate encryption
> +  */
> + eax = 0x801f;
> + ecx = 0;
> + native_cpuid(&eax, &ebx, &ecx, &edx);
> + if (!(eax & 1))
> + goto out;

< newline here.

> + me_mask = 1UL << (ebx & 0x3f);
> +
> + /* Check if SME is enabled */
> + msr = __rdmsr(MSR_K8_SYSCFG);
> + if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
> + goto out;
> +
> + /*
> +  * Fixups have not been applied to phys_base yet, so we must obtain
> +  * the address to the SME command line option data in the following
> +  * way.
> +  */
> + asm ("lea sme_cmdline_arg(%%rip), %0"
> +  : "=r" (cmdline_arg)
> +  : "p" (sme_cmdline_arg));
> + asm ("lea sme_cmdline_on(%%rip), %0"
> +  : "=r" (cmdline_on)
> +  : "p" (sme_cmdline_on));
> + asm ("lea sme_cmdline_off(%%rip), %0"
> +  : "=r" (cmdline_off)
> +  : "p" (sme_cmdline_off));
> +
> + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
> + active_by_default = true;
> + else
> + active_by_default = false;
> +
> + cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
> +  ((u64)bp->ext_cmd_line_ptr << 32));
> +
> + cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer));
> +
> + if (strncmp(buffer, cmdline_on, sizeof(buffer)) == 0)
> + sme_me_mask = me_mask;

Why doesn't simply

if (!strncmp(buffer, "on", 2))
...

work?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-19 Thread Borislav Petkov
On Fri, Apr 21, 2017 at 01:56:13PM -0500, Tom Lendacky wrote:
> On 4/18/2017 4:22 PM, Tom Lendacky wrote:
> > Add support to check if SME has been enabled and if memory encryption
> > should be activated (checking of command line option based on the
> > configuration of the default state).  If memory encryption is to be
> > activated, then the encryption mask is set and the kernel is encrypted
> > "in place."
> > 
> > Signed-off-by: Tom Lendacky 
> > ---
> >  arch/x86/kernel/head_64.S |1 +
> >  arch/x86/mm/mem_encrypt.c |   83 
> > +++--
> >  2 files changed, 80 insertions(+), 4 deletions(-)
> > 
> 
> ...
> 
> > 
> > -unsigned long __init sme_enable(void)
> > +unsigned long __init sme_enable(struct boot_params *bp)
> >  {
> > +   const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
> > +   unsigned int eax, ebx, ecx, edx;
> > +   unsigned long me_mask;
> > +   bool active_by_default;
> > +   char buffer[16];
> 
> So it turns out that when KASLR is enabled (CONFIG_RAMDOMIZE_BASE=y)
> the stack-protector support causes issues with this function because

What issues?

> it is called so early. I can get past it by adding:
> 
> CFLAGS_mem_encrypt.o := $(nostackp)
> 
> in the arch/x86/mm/Makefile, but that obviously eliminates the support
> for the whole file.  Would it be better to split out the sme_enable()
> and other boot routines into a separate file or just apply the
> $(nostackp) to the whole file?

Josh might have a better idea here... CCed.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] acpica: iort: Update SMMU models for IORT rev. C

2017-05-19 Thread Robert Richter
On 12.05.17 11:41:41, Robin Murphy wrote:
> IORT revision C has been published with a number of new SMMU
> implementation identifiers; define them.
> 
> CC: Rafael J. Wysocki 
> CC: Robert Moore 
> CC: Lv Zheng 
> Signed-off-by: Robin Murphy 

As an additional note, could these both patches being marked stable?
If we are going to deploy fw with the new model number, older kernels
become unusable as the smmu is not detected any longer.

Thanks,

-Robert
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: flush IOTLB for specific domains only (v2)

2017-05-19 Thread Jan Vesely
On Fri, 2017-05-19 at 15:32 +0530, arindam.n...@amd.com wrote:
> From: Arindam Nath 
> 
> Change History
> --
> 
> v2: changes suggested by Joerg
> - add flush flag to improve efficiency of flush operation
> 
> v1:
> - The idea behind flush queues is to defer the IOTLB flushing
>   for domains for which the mappings are no longer valid. We
>   add such domains in queue_add(), and when the queue size
>   reaches FLUSH_QUEUE_SIZE, we perform __queue_flush().
> 
>   Since we have already taken lock before __queue_flush()
>   is called, we need to make sure the IOTLB flushing is
>   performed as quickly as possible.
> 
>   In the current implementation, we perform IOTLB flushing
>   for all domains irrespective of which ones were actually
>   added in the flush queue initially. This can be quite
>   expensive especially for domains for which unmapping is
>   not required at this point of time.
> 
>   This patch makes use of domain information in
>   'struct flush_queue_entry' to make sure we only flush
>   IOTLBs for domains who need it, skipping others.

Hi,

just a note, the patch also fixes "AMD-Vi: Completion-Wait loop timed
out" boot hang on my system (carrizo + iceland) [0,1,2]. Presumably the
old loop also tried to flush domains that included powered-off devices.

regards,
Jan

[0] https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/issues/20
[1] https://bugs.freedesktop.org/show_bug.cgi?id=101029
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1448121

> 
> Suggested-by: Joerg Roedel 
> Signed-off-by: Arindam Nath 
> ---
>  drivers/iommu/amd_iommu.c   | 27 ---
>  drivers/iommu/amd_iommu_types.h |  2 ++
>  2 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 63cacf5..1edeebec 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2227,15 +2227,26 @@ static struct iommu_group 
> *amd_iommu_device_group(struct device *dev)
>  
>  static void __queue_flush(struct flush_queue *queue)
>  {
> - struct protection_domain *domain;
> - unsigned long flags;
>   int idx;
>  
> - /* First flush TLB of all known domains */
> - spin_lock_irqsave(&amd_iommu_pd_lock, flags);
> - list_for_each_entry(domain, &amd_iommu_pd_list, list)
> - domain_flush_tlb(domain);
> - spin_unlock_irqrestore(&amd_iommu_pd_lock, flags);
> + /* First flush TLB of all domains which were added to flush queue */
> + for (idx = 0; idx < queue->next; ++idx) {
> + struct flush_queue_entry *entry;
> +
> + entry = queue->entries + idx;
> +
> + /*
> +  * There might be cases where multiple IOVA entries for the
> +  * same domain are queued in the flush queue. To avoid
> +  * flushing the same domain again, we check whether the
> +  * flag is set or not. This improves the efficiency of
> +  * flush operation.
> +  */
> + if (!entry->dma_dom->domain.already_flushed) {
> + entry->dma_dom->domain.already_flushed = true;
> + domain_flush_tlb(&entry->dma_dom->domain);
> + }
> + }
>  
>   /* Wait until flushes have completed */
>   domain_flush_complete(NULL);
> @@ -2289,6 +2300,8 @@ static void queue_add(struct dma_ops_domain *dma_dom,
>   pages = __roundup_pow_of_two(pages);
>   address >>= PAGE_SHIFT;
>  
> + dma_dom->domain.already_flushed = false;
> +
>   queue = get_cpu_ptr(&flush_queue);
>   spin_lock_irqsave(&queue->lock, flags);
>  
> diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
> index 4de8f41..4f5519d 100644
> --- a/drivers/iommu/amd_iommu_types.h
> +++ b/drivers/iommu/amd_iommu_types.h
> @@ -454,6 +454,8 @@ struct protection_domain {
>   bool updated;   /* complete domain flush required */
>   unsigned dev_cnt;   /* devices assigned to this domain */
>   unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
> + bool already_flushed;   /* flag to avoid flushing the same domain again
> +in a single invocation of __queue_flush() */
>  };
>  
>  /*

-- 
Jan Vesely 

signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 19/32] x86/mm: Add support to access persistent memory in the clear

2017-05-19 Thread Tom Lendacky

On 5/16/2017 9:04 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:19:42PM -0500, Tom Lendacky wrote:

Persistent memory is expected to persist across reboots. The encryption
key used by SME will change across reboots which will result in corrupted
persistent memory.  Persistent memory is handed out by block devices
through memory remapping functions, so be sure not to map this memory as
encrypted.

Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/ioremap.c |   31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index bce0604..55317ba 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -425,17 +425,46 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
  * Examine the physical address to determine if it is an area of memory
  * that should be mapped decrypted.  If the memory is not part of the
  * kernel usable area it was accessed and created decrypted, so these
- * areas should be mapped decrypted.
+ * areas should be mapped decrypted. And since the encryption key can
+ * change across reboots, persistent memory should also be mapped
+ * decrypted.
  */
 static bool memremap_should_map_decrypted(resource_size_t phys_addr,
  unsigned long size)
 {
+   int is_pmem;
+
+   /*
+* Check if the address is part of a persistent memory region.
+* This check covers areas added by E820, EFI and ACPI.
+*/
+   is_pmem = region_intersects(phys_addr, size, IORESOURCE_MEM,
+   IORES_DESC_PERSISTENT_MEMORY);
+   if (is_pmem != REGION_DISJOINT)
+   return true;
+
+   /*
+* Check if the non-volatile attribute is set for an EFI
+* reserved area.
+*/
+   if (efi_enabled(EFI_BOOT)) {
+   switch (efi_mem_type(phys_addr)) {
+   case EFI_RESERVED_TYPE:
+   if (efi_mem_attributes(phys_addr) & EFI_MEMORY_NV)
+   return true;
+   break;
+   default:
+   break;
+   }
+   }
+
/* Check if the address is outside kernel usable area */
switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
case E820_TYPE_RESERVED:
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
+   case E820_TYPE_PRAM:


Can't you simply add:

case E820_TYPE_PMEM:

here too and thus get rid of the region_intersects() thing above?

Because, for example, e820_type_to_iores_desc() maps E820_TYPE_PMEM to
IORES_DESC_PERSISTENT_MEMORY so those should be equivalent...


I'll have to double-check this, but I believe that when persistent
memory is identified through the NFIT table it adds it as a resource
but doesn't add it as an e820 entry so I can't rely on the type being
returned as E820_TYPE_PMEM by e820__get_entry_type().

Thanks,
Tom




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 22/32] x86, swiotlb: DMA support for memory encryption

2017-05-19 Thread Tom Lendacky

On 5/16/2017 9:27 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:20:10PM -0500, Tom Lendacky wrote:

Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create decrypted bounce buffers for use by these devices.


Use a verb in the subject:

Subject: x86, swiotlb: Add memory encryption support

or similar.


Will do.

Thanks,
Tom




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 23/32] swiotlb: Add warnings for use of bounce buffers with SME

2017-05-19 Thread Tom Lendacky

On 5/16/2017 9:52 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:20:19PM -0500, Tom Lendacky wrote:

Add warnings to let the user know when bounce buffers are being used for
DMA when SME is active.  Since the bounce buffers are not in encrypted
memory, these notifications are to allow the user to determine some
appropriate action - if necessary.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |   11 +++
 include/linux/dma-mapping.h|   11 +++
 include/linux/mem_encrypt.h|6 ++
 lib/swiotlb.c  |3 +++
 4 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 0637b4b..b406df2 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,11 @@ static inline bool sme_active(void)
return !!sme_me_mask;
 }

+static inline u64 sme_dma_mask(void)
+{
+   return ((u64)sme_me_mask << 1) - 1;
+}
+
 void __init sme_early_encrypt(resource_size_t paddr,
  unsigned long size);
 void __init sme_early_decrypt(resource_size_t paddr,
@@ -50,6 +55,12 @@ static inline bool sme_active(void)
 {
return false;
 }
+
+static inline u64 sme_dma_mask(void)
+{
+   return 0ULL;
+}
+
 #endif

 static inline void __init sme_early_encrypt(resource_size_t paddr,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0977317..f825870 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 

 /**
  * List of possible attributes associated with a DMA mapping. The semantics
@@ -577,6 +578,11 @@ static inline int dma_set_mask(struct device *dev, u64 
mask)

if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;
+
+   if (sme_active() && (mask < sme_dma_mask()))
+   dev_warn_ratelimited(dev,
+"SME is active, device will require DMA bounce 
buffers\n");


Bah, no need to break that line - just let it stick out. Ditto for the
others.


Ok.

Thanks,
Tom




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-19 Thread Borislav Petkov
On Fri, May 19, 2017 at 03:16:51PM -0500, Josh Poimboeuf wrote:
> I'm the stack validation guy, not the stack protection guy :-)

LOL. I thought you were *the* stacks guy. :-)))

But once you've validated it, you could protect it then too. :-)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-19 Thread Tom Lendacky

On 5/17/2017 2:17 PM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:21:21PM -0500, Tom Lendacky wrote:

Provide support so that kexec can be used to boot a kernel when SME is
enabled.

Support is needed to allocate pages for kexec without encryption.  This
is needed in order to be able to reboot in the kernel in the same manner
as originally booted.

Additionally, when shutting down all of the CPUs we need to be sure to
flush the caches and then halt. This is needed when booting from a state
where SME was not active into a state where SME is active (or vice-versa).
Without these steps, it is possible for cache lines to exist for the same
physical location but tagged both with and without the encryption bit. This
can cause random memory corruption when caches are flushed depending on
which cacheline is written last.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/init.h  |1 +
 arch/x86/include/asm/irqflags.h  |5 +
 arch/x86/include/asm/kexec.h |8 
 arch/x86/include/asm/pgtable_types.h |1 +
 arch/x86/kernel/machine_kexec_64.c   |   35 +-
 arch/x86/kernel/process.c|   26 +++--
 arch/x86/mm/ident_map.c  |   11 +++
 include/linux/kexec.h|   14 ++
 kernel/kexec_core.c  |7 +++
 9 files changed, 101 insertions(+), 7 deletions(-)


...


@@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
}
pte = pte_offset_kernel(pmd, vaddr);
-   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
return 0;
 err:
free_transition_pgtable(image);
@@ -114,6 +114,7 @@ static int init_pgtable(struct kimage *image, unsigned long 
start_pgtable)
.alloc_pgt_page = alloc_pgt_page,
.context= image,
.pmd_flag   = __PAGE_KERNEL_LARGE_EXEC,
+   .kernpg_flag= _KERNPG_TABLE_NOENC,
};
unsigned long mstart, mend;
pgd_t *level4p;
@@ -597,3 +598,35 @@ void arch_kexec_unprotect_crashkres(void)
 {
kexec_mark_crashkres(false);
 }
+
+int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
+{
+   int ret;
+
+   if (sme_active()) {


if (!sme_active())
return 0;

/*
 * If SME...



Ok.




+   /*
+* If SME is active we need to be sure that kexec pages are
+* not encrypted because when we boot to the new kernel the
+* pages won't be accessed encrypted (initially).
+*/
+   ret = set_memory_decrypted((unsigned long)vaddr, pages);
+   if (ret)
+   return ret;
+
+   if (gfp & __GFP_ZERO)
+   memset(vaddr, 0, pages * PAGE_SIZE);


This function is called after alloc_pages() which already zeroes memory
when __GFP_ZERO is supplied.

If you need to clear the memory *after* set_memory_encrypted() happens,
then you should probably mask out __GFP_ZERO before the alloc_pages()
call so as not to do it twice.


I'll look into that.  I could put the memset() at the end of this
function so that it is done here no matter what.  And update the
default arch_kexec_post_alloc_pages() to also do the memset(). It
just hides the clearing of the pages a bit though by doing that.




+   }
+
+   return 0;
+}
+
+void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
+{
+   if (sme_active()) {
+   /*
+* If SME is active we need to reset the pages back to being
+* an encrypted mapping before freeing them.
+*/
+   set_memory_encrypted((unsigned long)vaddr, pages);
+   }
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0bb8842..f4e5de6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -355,8 +356,25 @@ bool xen_set_default_idle(void)
return ret;
 }
 #endif
+
 void stop_this_cpu(void *dummy)
 {
+   bool do_wbinvd_halt = false;
+
+   if (kexec_in_progress && boot_cpu_has(X86_FEATURE_SME)) {
+   /*
+* If we are performing a kexec and the processor supports
+* SME then we need to clear out cache information before
+* halting. With kexec, going from SME inactive to SME active
+* requires clearing cache entries so that addresses without
+* the encryption bit set don't corrupt the same physical
+* address that has the encryption bit set when caches are
+* flushed. Perform a wbinvd fol

Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-19 Thread Josh Poimboeuf
On Fri, May 19, 2017 at 01:30:05PM +0200, Borislav Petkov wrote:
> > it is called so early. I can get past it by adding:
> > 
> > CFLAGS_mem_encrypt.o := $(nostackp)
> > 
> > in the arch/x86/mm/Makefile, but that obviously eliminates the support
> > for the whole file.  Would it be better to split out the sme_enable()
> > and other boot routines into a separate file or just apply the
> > $(nostackp) to the whole file?
> 
> Josh might have a better idea here... CCed.

I'm the stack validation guy, not the stack protection guy :-)

But there is a way to disable compiler options on a per-function basis
with the gcc __optimize__ function attribute.  For example:

  __attribute__((__optimize__("no-stack-protector")))

-- 
Josh
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear

2017-05-19 Thread Tom Lendacky

On 5/18/2017 4:02 AM, Borislav Petkov wrote:

On Wed, May 17, 2017 at 01:54:39PM -0500, Tom Lendacky wrote:

I was worried what the compiler might do when CONFIG_EFI is not set,
but it appears to take care of it. I'll double check though.


There's a efi_enabled() !CONFIG_EFI version too, so should be fine.


I may introduce a length variable to capture data->len right after
paddr_next is set and then have just a single memunmap() call before
the if check.


Yap.


I tried that, but calling an "__init" function (early_memremap()) from
a non "__init" function generated warnings. I suppose I can pass in a
function for the map and unmap but that looks worse to me (also the
unmap functions take different arguments).


No, the other way around: the __init function should call the non-init
one and you need the non-init one anyway for memremap_is_setup_data().



The "worker" function would be doing the loop through the setup data,
but since the setup data is mapped inside the loop I can't do the __init
calling the non-init function and still hope to consolidate the code.
Maybe I'm missing something here...

Thanks,
Tom


This is like the chicken and the egg scenario. In order to determine if
an address is setup data I have to explicitly map the setup data chain
as decrypted. In order to do that I have to supply a flag to explicitly
map the data decrypted otherwise I wind up back in the
memremap_is_setup_data() function again and again and again...


Oh, fun.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-19 Thread Borislav Petkov
On Fri, May 19, 2017 at 03:45:28PM -0500, Tom Lendacky wrote:
> Actually there is.  The above will result in data in the cache because
> halt() turns into a function call if CONFIG_PARAVIRT is defined (refer
> to the comment above where do_wbinvd_halt is set to true). I could make
> this a native_wbinvd() and native_halt()

That's why we have the native_* versions - to bypass paravirt crap.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-19 Thread Tom Lendacky

On 5/19/2017 3:58 PM, Borislav Petkov wrote:

On Fri, May 19, 2017 at 03:45:28PM -0500, Tom Lendacky wrote:

Actually there is.  The above will result in data in the cache because
halt() turns into a function call if CONFIG_PARAVIRT is defined (refer
to the comment above where do_wbinvd_halt is set to true). I could make
this a native_wbinvd() and native_halt()


That's why we have the native_* versions - to bypass paravirt crap.


As long as those never change from static inline everything will be
fine. I can change it, but I really like how it explicitly indicates
what is needed in this case. Even if the function gets changed from
static inline the fact that the instructions are sequential in the
function covers that case.

Thanks,
Tom




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-19 Thread Borislav Petkov
On Fri, May 19, 2017 at 04:07:24PM -0500, Tom Lendacky wrote:
> As long as those never change from static inline everything will be
> fine. I can change it, but I really like how it explicitly indicates

I know what you want to do. But you're practically defining a helper
which contains two arbitrary instructions which probably no one else
will need.

So how about we simplify this function even more. We don't need to pay
attention to kexec being in progress because we're halting anyway so who
cares how fast we halt.

Might have to state that in the comment below though, instead of what's
there now.

And for the exact same moot reason, we don't need to look at SME CPUID
feature - we can just as well WBINVD unconditionally.

void stop_this_cpu(void *dummy)
{
local_irq_disable();
/*
 * Remove this CPU:
 */
set_cpu_online(smp_processor_id(), false);
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));

for (;;) {
/*
 * If we are performing a kexec and the processor supports
 * SME then we need to clear out cache information before
 * halting. With kexec, going from SME inactive to SME active
 * requires clearing cache entries so that addresses without
 * the encryption bit set don't corrupt the same physical
 * address that has the encryption bit set when caches are
 * flushed. Perform a wbinvd followed by a halt to achieve
 * this.
 */
asm volatile("wbinvd; hlt" ::: "memory");
}
}

How's that?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-19 Thread Tom Lendacky

On 5/19/2017 4:28 PM, Borislav Petkov wrote:

On Fri, May 19, 2017 at 04:07:24PM -0500, Tom Lendacky wrote:

As long as those never change from static inline everything will be
fine. I can change it, but I really like how it explicitly indicates


I know what you want to do. But you're practically defining a helper
which contains two arbitrary instructions which probably no one else
will need.

So how about we simplify this function even more. We don't need to pay
attention to kexec being in progress because we're halting anyway so who
cares how fast we halt.

Might have to state that in the comment below though, instead of what's
there now.

And for the exact same moot reason, we don't need to look at SME CPUID
feature - we can just as well WBINVD unconditionally.

void stop_this_cpu(void *dummy)
{
local_irq_disable();
/*
 * Remove this CPU:
 */
set_cpu_online(smp_processor_id(), false);
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));

for (;;) {
/*
 * If we are performing a kexec and the processor supports
 * SME then we need to clear out cache information before
 * halting. With kexec, going from SME inactive to SME active
 * requires clearing cache entries so that addresses without
 * the encryption bit set don't corrupt the same physical
 * address that has the encryption bit set when caches are
 * flushed. Perform a wbinvd followed by a halt to achieve
 * this.
 */
asm volatile("wbinvd; hlt" ::: "memory");
}
}

How's that?


I can live with that!

Thanks,
Tom




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu