[PATCH 1/1] iommu/vt-d: Handle memory shortage on pasid table allocation

2018-08-31 Thread Lu Baolu
Pasid table memory allocation could return failure due to memory
shortage. Limit the pasid table size to 1MiB because current 8MiB
contiguous physical memory allocation can be hard to come by. W/o
a PASID table, the device could continue to work with only shared
virtual memory impacted. So, let's go ahead with context mapping
even the memory allocation for pasid table failed.

Fixes: cc580e41260d ("iommu/vt-d: Per PCI device pasid table interfaces")
Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Mika Westerberg 
Reported-and-tested-by: Pelton Kyle D 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 6 +++---
 drivers/iommu/intel-pasid.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5f3f10cf9d9d..bedc801b06a0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2540,9 +2540,9 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (dev && dev_is_pci(dev) && info->pasid_supported) {
ret = intel_pasid_alloc_table(dev);
if (ret) {
-   __dmar_remove_one_dev_info(info);
-   spin_unlock_irqrestore(&device_domain_lock, flags);
-   return NULL;
+   pr_warn("No pasid table for %s, pasid disabled\n",
+   dev_name(dev));
+   info->pasid_supported = 0;
}
}
spin_unlock_irqrestore(&device_domain_lock, flags);
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 1c05ed6fc5a5..1fb5e12b029a 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -11,7 +11,7 @@
 #define __INTEL_PASID_H
 
 #define PASID_MIN  0x1
-#define PASID_MAX  0x10
+#define PASID_MAX  0x2
 
 struct pasid_entry {
u64 val;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-08-31 Thread Xu Zaibo

Hi Jean,

On 2018/8/31 21:34, Jean-Philippe Brucker wrote:

On 27/08/18 09:06, Xu Zaibo wrote:

+struct vfio_iommu_type1_bind_process {
+__u32flags;
+#define VFIO_IOMMU_BIND_PID(1 << 0)
+__u32pasid;

As I am doing some works on the SVA patch set. I just consider why the
user space need this pasid.
Maybe, is it much more reasonable to set the pasid into all devices
under the vfio container by
a call back function from 'vfio_devices'  while
'VFIO_IOMMU_BIND_PROCESS' CMD is executed
in kernel land? I am not sure because there exists no suitable call back
in 'vfio_device' at present.

When using vfio-pci, the kernel doesn't know how to program the PASID
into the device because the only kernel driver for the device is the
generic vfio-pci module. The PCI specification doesn't describe a way of
setting up the PASID, it's vendor-specific. Only the userspace
application owning the device (and calling VFIO_IOMMU_BIND) knows how to
do it, so we return the allocated PASID.

Note that unlike vfio-mdev where applications share slices of a
function, with vfio-pci one application owns the whole function so it's
safe to let userspace set the PASID in hardware. With vfio-mdev it's the
kernel driver that should be in charge of setting the PASID as you
described, and we wouldn't have a reason to return the PASID in the
vfio_iommu_type1_bind_process structure.

Understood. But I still get the following confusion:

As one application takes a whole function while using VFIO-PCI, why do 
the application and the
function need to enable PASID capability? (Since just one I/O page table 
is enough for them.)


Maybe I misunderstood, hope you can help me clear it. Thank you very much.

Thanks,
Zaibo

.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: your mail

2018-08-31 Thread Paul Burton
Hi Christoph,

On Mon, Aug 27, 2018 at 04:50:27PM +0200, Christoph Hellwig wrote:
> Subject: [RFC] merge dma_direct_ops and dma_noncoherent_ops
> 
> While most architectures are either always or never dma coherent for a
> given build, the arm, arm64, mips and soon arc architectures can have
> different dma coherent settings on a per-device basis.  Additionally
> some mips builds can decide at boot time if dma is coherent or not.
> 
> I've started to look into handling noncoherent dma in swiotlb, and
> moving the dma-iommu ops into common code [1], and for that we need a
> generic way to check if a given device is coherent or not.  Moving
> this flag into struct device also simplifies the conditionally coherent
> architecture implementations.
> 
> These patches are also available in a git tree given that they have
> a few previous posted dependencies:
> 
> git://git.infradead.org/users/hch/misc.git dma-direct-noncoherent-merge
> 
> Gitweb:
> 
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-noncoherent-merge

Apart from the nits in patch 2, these look sane to me from a MIPS
perspective, so for patches 1-4:

Acked-by: Paul Burton  # MIPS parts

Thanks,
Paul
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] dma-mapping: move the dma_coherent flag to struct device

2018-08-31 Thread Paul Burton
Hi Christoph,

On Mon, Aug 27, 2018 at 04:50:29PM +0200, Christoph Hellwig wrote:
> Various architectures support both coherent and non-coherent dma on
> a per-device basis.  Move the dma_noncoherent flag from mips the
> mips archdata field to struct device proper to prepare the
> infrastructure for reuse on other architectures.

"mips the mips"

> diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
> index a0aa00cc909d..f99748e9c08e 100644
> --- a/include/linux/dma-noncoherent.h
> +++ b/include/linux/dma-noncoherent.h
> @@ -4,6 +4,24 @@
>  
>  #include 
>  
> +#ifdef CONFIG_ARCH_HAS_DMA_COHERENCE_H
> +#include 
> +#else
> +#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
> +defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
> +defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)

#elif ?

Thanks,
Paul
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3] of/platform: initialise AMBA default DMA masks

2018-08-31 Thread Linus Walleij
This addresses a v4.19-rc1 regression in the PL111 DRM driver
in drivers/gpu/pl111/*

The driver uses the CMA KMS helpers and will thus at some
point call down to dma_alloc_attrs() to allocate a chunk
of contigous DMA memory for the framebuffer.

It appears that in v4.18, it was OK that this (and other
DMA mastering AMBA devices) left dev->coherent_dma_mask
blank (zero).

In v4.19-rc1 the WARN_ON_ONCE(dev && !dev->coherent_dma_mask)
in dma_alloc_attrs() in include/linux/dma-mapping.h is
triggered. The allocation later fails when get_coherent_dma_mask()
is called from __dma_alloc() and __dma_alloc() returns
NULL:

drm-clcd-pl111 dev:20: coherent DMA mask is unset
drm-clcd-pl111 dev:20: [drm:drm_fb_helper_fbdev_setup] *ERROR*
Failed to set fbdev configuration

It turns out that in commit 4d8bde883bfb
("OF: Don't set default coherent DMA mask")
the OF core stops setting the default DMA mask on new devices,
especially those lines of the patch:

- if (!dev->coherent_dma_mask)
-   dev->coherent_dma_mask = DMA_BIT_MASK(32);

Robin Murphy solved a similar problem in
a5516219b102 ("of/platform: Initialise default DMA masks")
by simply assigning dev.coherent_dma_mask and the
dev.dma_mask to point to the same when creating devices
from the device tree, and introducing the same code into
the code path creating AMBA/PrimeCell devices solved my
problem, graphics now come up.

The code simply assumes that the device can access all
of the system memory by setting the coherent DMA mask
to 0x when creating a device from the device
tree, which is crude, but seems to be what kernel v4.18
assumed.

The AMBA PrimeCells do not differ between coherent and
streaming DMA so we can just assign the same to any
DMA mask.

Possibly drivers should augment their coherent DMA mask
in accordance with "dma-ranges" from the device tree
if more finegranular masking is needed.

Reported-by: Russell King 
Fixes: 4d8bde883bfb ("OF: Don't set default coherent DMA mask")
Cc: Russell King 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Signed-off-by: Linus Walleij 
---
ChangeLog v2->v3:
- Provide proper root cause analysis, point to the right
  offending commit with Fixes:
- Make even more elaborate description of what is causing
  this.
ChangeLog v1->v2:
- Provide a better description for the change.
---
 drivers/of/platform.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 7ba90c290a42..6c59673933e9 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -241,6 +241,10 @@ static struct amba_device *of_amba_device_create(struct 
device_node *node,
if (!dev)
goto err_clear_flag;
 
+   /* AMBA devices only support a single DMA mask */
+   dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+   dev->dev.dma_mask = &dev->dev.coherent_dma_mask;
+
/* setup generic device info */
dev->dev.of_node = of_node_get(node);
dev->dev.fwnode = &node->fwnode;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 09/13] iommu/smmuv3: Get prepared for nested stage support

2018-08-31 Thread Auger Eric
Hi Jean-Philippe,

On 08/31/2018 03:20 PM, Jean-Philippe Brucker wrote:
> On 23/08/18 13:17, Eric Auger wrote:
>>  if (ste->s1_cfg) {
>> -BUG_ON(ste_live);
> 
> Scary! :) The current code assumes that it can make modifications to the
> STE in any order and enable translation after a sync. So far I haven't
> been able to find anything that violates this rule in the spec: "If
> software modifies the structure while it is valid, it must not allow the
> structure to enter an invalid intermediate state." So maybe it's fine,
> though to be safe I would have started with disabling the STE
> (http://www.linux-arm.org/git?p=linux-jpb.git;a=commitdiff;h=936e49f923e101c061269eadd5fa43fef819d2e9)

Yep this works with my setup but I was waiting for such kind of comments
to turn this prototype into something more "production level" ;-) Did
you send anything upstream, related to this branch?

Thanks

Eric
> 
> Thanks,
> Jean
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/13] iommu: Introduce tlb_invalidate API

2018-08-31 Thread Auger Eric
Hi Jean-Philippe,

On 08/31/2018 03:17 PM, Jean-Philippe Brucker wrote:
> On 23/08/18 13:17, Eric Auger wrote:
>> +/**
>> + * Translation cache invalidation information, contains generic IOMMU
>> + * data which can be parsed based on model ID by model specific drivers.
>> + * Since the invalidation of second level page tables are included in the
>> + * unmap operation, this info is only applicable to the first level
>> + * translation caches, i.e. DMA request with PASID.
>> + *
>> + * @granularity:requested invalidation granularity, type dependent
>> + * @size:   2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
>> + * @nr_pages:   number of pages to invalidate
>> + * @pasid:  processor address space ID value per PCI spec.
>> + * @addr:   page address to be invalidated
>> + * @flags   IOMMU_INVALIDATE_ADDR_LEAF: leaf paging entries
>> + *  IOMMU_INVALIDATE_GLOBAL_PAGE: global pages
>> + *
>> + */
>> +struct iommu_tlb_invalidate_info {
>> +struct iommu_tlb_invalidate_hdr hdr;
>> +enum iommu_inv_granularity  granularity;
>> +__u32   flags;
>> +#define IOMMU_INVALIDATE_ADDR_LEAF  (1 << 0)
>> +#define IOMMU_INVALIDATE_GLOBAL_PAGE(1 << 1)
>> +__u8size;
>> +__u64   nr_pages;
>> +__u32   pasid;
>> +__u64   addr;
>> +};
>>  #endif /* _UAPI_IOMMU_H */
> 
> Since the ioctl will be used to combine invalidations (invalidate both
> ATC and TLB with a single call), we need an additional ASID field for
> the SMMU - ATC is invalidated by PASID, TLB by ASID. I used to call it
> "tag", but I'm leaning towards "arch_id" now
> (http://www.linux-arm.org/git?p=linux-jpb.git;a=commitdiff;h=40fdef74816dd8d8d113100b9e0162fab4cec28d)

I aknowledge I am not crystal clear about that. for a given iommu_domain
don't you have a single asid. Can't you retrieve the asid from the
iommu_domain/arm_smmu_domain/arm_smmu_s1_cfg/arm_smmu_ctx_desc.asid?
Here again I am confused bout the dual iommu_domain/struct device
parameters.

I have another trouble while doing the QEMU integration.
When the guests does an NH_ALL, this propagates an invalidation on the
whole IPA range and we must discriminate that from regular NH_VA calls.
How would you encode the NH_ALL with this API?

Besides I discover you work on virtio-iommu/stage2 enablement ;-)

Thanks

Eric
> 
> Thanks,
> Jean
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/13] iommu: Introduce bind_guest_stage API

2018-08-31 Thread Auger Eric
Hi Jean-Philippe,

On 08/31/2018 03:11 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On 23/08/18 16:25, Auger Eric wrote:
>>> +int iommu_bind_guest_stage(struct iommu_domain *domain, struct device *dev,
>>> +  struct iommu_guest_stage_config *cfg)
> 
> About the name change from iommu_bind_pasid_table: is the intent to
> reuse this API for SMMUv2, which supports nested but not PASID? Seems
> like a good idea but "iommu_bind_table" may be better since "stage" is
> only used by Arm.

At the moment I don't target SMUv2 but just SMMUv3. My focus was on
nested stage enablement without enabling the multi-CD feature (PASID),
whish is not supported by the QEMU vSMMUv3. Afterwards I realized that
basically we are pointing to a CD or PASID table and that's about the
same. I don't have a strong opinion on the name, iommu_bind_guest_table
or iommu_bind_pasid_table would be fine with me. Indeed "stage" is ARM
vocable (level for Intel?)
> 
>>> +/**
>>> + * PASID table data used to bind guest PASID table to the host IOMMU. This 
>>> will
>>> + * enable guest managed first level page tables.
>>> + * @version: for future extensions and identification of the data format
>>> + * @bytes: size of this structure
>>> + * @base_ptr:  PASID table pointer
>>> + * @pasid_bits:number of bits supported in the guest PASID table, must 
>>> be less
>>> + * or equal than the host supported PASID size.
>>> + */
>>> +struct iommu_pasid_table_config {
>>> +   __u32 version;
>>> +#define PASID_TABLE_CFG_VERSION_1 1
>>> +   __u32 bytes;
>>> +   __u64 base_ptr;
>>> +   __u8 pasid_bits;
>>> +};
>>> +
>>> +/**
>>> + * Stream Table Entry S1 info
>>> + * @disabled: the smmu is disabled
>>> + * @bypassed: stage 1 is bypassed
>>> + * @aborted: aborted
>>> + * @cdptr_dma: GPA of the Context Descriptor
>>> + * @asid: address space identified associated to the CD
>>> + */
>>> +struct iommu_smmu_s1_config {
>>> +   __u8 disabled;
>>> +   __u8 bypassed;
>>> +   __u8 aborted;
>>> +   __u64 cdptr_dma;
>>> +   __u16 asid;
>> This asid field is a leftover. asid is part of the CD (owned by the
>> guest) and cannot be conveyed through this API. What is relevant is to
>> pass is the guest asid_bits (vSMMU IDR1 info), to be compared with the
>> host one. So we end up having something similar to the base_ptr and
>> pasid_bits of iommu_pasid_table_config.
> 
> That part seems strange. Userspace wouldn't try arbitrary ASID sizes
> until one fits, especially since ASID size isn't the only parameter:
> there will be SSID, IOVA, IPA sizes, HTTU features and I guess any other
> SMMU_IDRx bit that corresponds to a field in the CD or STE. Doesn't
> vSMMU need to tailor its IDR registers to whatever is supported by the
> host SMMU, in order to use nested translation?
Yes I agree this is needed anyway.
> 
> Before creating the virtual ID registers, userspace will likely want to
> know more about the physical IOMMU, by querying the kernel. I wrote
> something about that interface recently:
> https://www.spinics.net/lists/iommu/msg28039.html

Ah OK I missed that part of the discussion. About the sysfs API, we
devised one in the past for reserved_regions. I thought it would be
useful for QEMU to identify them but eventually Alex pushed to create a
VFIO API instead to make things more self-contained. We would need to
double check with him.
> 
> And if you provide this interface, checking the parameters again in
> BIND_GUEST_STAGE isn't very useful, it only tells you that userspace
> read your values. Once the guest writes the CD, it could still use the
> wrong ASID size or some other invalid config. That would be caught by
> the SMMU walker and cause a C_BAD_CD error report.

Yes actually if there is some mismatch we might get BAD_STE/BAD_CD
events. This might be another way to handle things. I did not integrate
the fault API yet. This exercise is need to understand how we can catch
things at QEMU level.
> 
>> Otherwise, the other fields (disable, bypassed, aborted) can be packed
>> as masks in a _u8 flag.
> 
> What's the difference between aborted and disabled? I'm also not sure we
> should give such fine tweaks to userspace, since the STE code is already
> pretty complicated and has to deal with several state transitions.
> Existing behavior (1) (2) (5) probably needs to be preserved:
> 
> (1) Initially no container is set, s1-translate s2-bypass with the
> default domain (non-VFIO)
> (2) A VFIO_TYPE1_NESTING_IOMMU container is set, s1-bypass s2-translate
> (3) BIND_GUEST_STAGE, s1-translate s2-translate
> (4) UNBIND_GUEST_STAGE, s1-bypass s2-translate
> (5) Remove container, s1-translate s2-bypass with the default domain
> 
> That said, when instantiating a vSMMU, QEMU may want to switch from (2)
> to an intermediate "s1-abort s2-translate" state. At the moment with
> virtio-iommu I only create the stage-2 mappings at (3). This ensures
> that transactions abort until the guest configures the vIOMMU and it
> keeps the host driver simple,

Re: [PATCH v2 13/40] vfio: Add support for Shared Virtual Addressing

2018-08-31 Thread Jean-Philippe Brucker
Hi Zaibo,

On 27/08/18 09:06, Xu Zaibo wrote:
>> +struct vfio_iommu_type1_bind_process {
>> +    __u32    flags;
>> +#define VFIO_IOMMU_BIND_PID    (1 << 0)
>> +    __u32    pasid;
> As I am doing some works on the SVA patch set. I just consider why the
> user space need this pasid.
> Maybe, is it much more reasonable to set the pasid into all devices
> under the vfio container by
> a call back function from 'vfio_devices'  while
> 'VFIO_IOMMU_BIND_PROCESS' CMD is executed
> in kernel land? I am not sure because there exists no suitable call back
> in 'vfio_device' at present.

When using vfio-pci, the kernel doesn't know how to program the PASID
into the device because the only kernel driver for the device is the
generic vfio-pci module. The PCI specification doesn't describe a way of
setting up the PASID, it's vendor-specific. Only the userspace
application owning the device (and calling VFIO_IOMMU_BIND) knows how to
do it, so we return the allocated PASID.

Note that unlike vfio-mdev where applications share slices of a
function, with vfio-pci one application owns the whole function so it's
safe to let userspace set the PASID in hardware. With vfio-mdev it's the
kernel driver that should be in charge of setting the PASID as you
described, and we wouldn't have a reason to return the PASID in the
vfio_iommu_type1_bind_process structure.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 09/13] iommu/smmuv3: Get prepared for nested stage support

2018-08-31 Thread Jean-Philippe Brucker
On 23/08/18 13:17, Eric Auger wrote:
>   if (ste->s1_cfg) {
> - BUG_ON(ste_live);

Scary! :) The current code assumes that it can make modifications to the
STE in any order and enable translation after a sync. So far I haven't
been able to find anything that violates this rule in the spec: "If
software modifies the structure while it is valid, it must not allow the
structure to enter an invalid intermediate state." So maybe it's fine,
though to be safe I would have started with disabling the STE
(http://www.linux-arm.org/git?p=linux-jpb.git;a=commitdiff;h=936e49f923e101c061269eadd5fa43fef819d2e9)

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/13] iommu: Introduce tlb_invalidate API

2018-08-31 Thread Jean-Philippe Brucker
On 23/08/18 13:17, Eric Auger wrote:
> +/**
> + * Translation cache invalidation information, contains generic IOMMU
> + * data which can be parsed based on model ID by model specific drivers.
> + * Since the invalidation of second level page tables are included in the
> + * unmap operation, this info is only applicable to the first level
> + * translation caches, i.e. DMA request with PASID.
> + *
> + * @granularity: requested invalidation granularity, type dependent
> + * @size:2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
> + * @nr_pages:number of pages to invalidate
> + * @pasid:   processor address space ID value per PCI spec.
> + * @addr:page address to be invalidated
> + * @flagsIOMMU_INVALIDATE_ADDR_LEAF: leaf paging entries
> + *   IOMMU_INVALIDATE_GLOBAL_PAGE: global pages
> + *
> + */
> +struct iommu_tlb_invalidate_info {
> + struct iommu_tlb_invalidate_hdr hdr;
> + enum iommu_inv_granularity  granularity;
> + __u32   flags;
> +#define IOMMU_INVALIDATE_ADDR_LEAF   (1 << 0)
> +#define IOMMU_INVALIDATE_GLOBAL_PAGE (1 << 1)
> + __u8size;
> + __u64   nr_pages;
> + __u32   pasid;
> + __u64   addr;
> +};
>  #endif /* _UAPI_IOMMU_H */

Since the ioctl will be used to combine invalidations (invalidate both
ATC and TLB with a single call), we need an additional ASID field for
the SMMU - ATC is invalidated by PASID, TLB by ASID. I used to call it
"tag", but I'm leaning towards "arch_id" now
(http://www.linux-arm.org/git?p=linux-jpb.git;a=commitdiff;h=40fdef74816dd8d8d113100b9e0162fab4cec28d)

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/13] iommu: Introduce bind_guest_stage API

2018-08-31 Thread Jean-Philippe Brucker
Hi Eric,

On 23/08/18 16:25, Auger Eric wrote:
>> +int iommu_bind_guest_stage(struct iommu_domain *domain, struct device *dev,
>> +   struct iommu_guest_stage_config *cfg)

About the name change from iommu_bind_pasid_table: is the intent to
reuse this API for SMMUv2, which supports nested but not PASID? Seems
like a good idea but "iommu_bind_table" may be better since "stage" is
only used by Arm.

>> +/**
>> + * PASID table data used to bind guest PASID table to the host IOMMU. This 
>> will
>> + * enable guest managed first level page tables.
>> + * @version: for future extensions and identification of the data format
>> + * @bytes: size of this structure
>> + * @base_ptr:   PASID table pointer
>> + * @pasid_bits: number of bits supported in the guest PASID table, must 
>> be less
>> + *  or equal than the host supported PASID size.
>> + */
>> +struct iommu_pasid_table_config {
>> +__u32 version;
>> +#define PASID_TABLE_CFG_VERSION_1 1
>> +__u32 bytes;
>> +__u64 base_ptr;
>> +__u8 pasid_bits;
>> +};
>> +
>> +/**
>> + * Stream Table Entry S1 info
>> + * @disabled: the smmu is disabled
>> + * @bypassed: stage 1 is bypassed
>> + * @aborted: aborted
>> + * @cdptr_dma: GPA of the Context Descriptor
>> + * @asid: address space identified associated to the CD
>> + */
>> +struct iommu_smmu_s1_config {
>> +__u8 disabled;
>> +__u8 bypassed;
>> +__u8 aborted;
>> +__u64 cdptr_dma;
>> +__u16 asid;
> This asid field is a leftover. asid is part of the CD (owned by the
> guest) and cannot be conveyed through this API. What is relevant is to
> pass is the guest asid_bits (vSMMU IDR1 info), to be compared with the
> host one. So we end up having something similar to the base_ptr and
> pasid_bits of iommu_pasid_table_config.

That part seems strange. Userspace wouldn't try arbitrary ASID sizes
until one fits, especially since ASID size isn't the only parameter:
there will be SSID, IOVA, IPA sizes, HTTU features and I guess any other
SMMU_IDRx bit that corresponds to a field in the CD or STE. Doesn't
vSMMU need to tailor its IDR registers to whatever is supported by the
host SMMU, in order to use nested translation?

Before creating the virtual ID registers, userspace will likely want to
know more about the physical IOMMU, by querying the kernel. I wrote
something about that interface recently:
https://www.spinics.net/lists/iommu/msg28039.html

And if you provide this interface, checking the parameters again in
BIND_GUEST_STAGE isn't very useful, it only tells you that userspace
read your values. Once the guest writes the CD, it could still use the
wrong ASID size or some other invalid config. That would be caught by
the SMMU walker and cause a C_BAD_CD error report.

> Otherwise, the other fields (disable, bypassed, aborted) can be packed
> as masks in a _u8 flag.

What's the difference between aborted and disabled? I'm also not sure we
should give such fine tweaks to userspace, since the STE code is already
pretty complicated and has to deal with several state transitions.
Existing behavior (1) (2) (5) probably needs to be preserved:

(1) Initially no container is set, s1-translate s2-bypass with the
default domain (non-VFIO)
(2) A VFIO_TYPE1_NESTING_IOMMU container is set, s1-bypass s2-translate
(3) BIND_GUEST_STAGE, s1-translate s2-translate
(4) UNBIND_GUEST_STAGE, s1-bypass s2-translate
(5) Remove container, s1-translate s2-bypass with the default domain

That said, when instantiating a vSMMU, QEMU may want to switch from (2)
to an intermediate "s1-abort s2-translate" state. At the moment with
virtio-iommu I only create the stage-2 mappings at (3). This ensures
that transactions abort until the guest configures the vIOMMU and it
keeps the host driver simple, but it has the downside of pinning guest
memory lazily (necessary with virtio-iommu since the guest falls back to
the map/unmap interface, which doesn't pin all memory upfront, if it
can't support nested).

> Anyway this API still is at the stage of proof of concept. At least it
> shows the similarities between that use case and the PASID one and
> should encourage to discuss about the relevance to have a unified API to
> pass the guest stage info.
Other required fields in the BIND_GUEST_STAGE ioctl are at least s1dss,
s1cdmax and s1fmt. It's not for sanity-check, they describe the guest
configuration. The guest selects a PASID table format (1-level, 2-level
4k/64k) which is written into STE.s1fmt. It also selects the behavior of
requests-without-pasid, which is written into STE.s1dss. s1cdmax
corresponds to pasid_bits.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 0/7 v6] Support for fsl-mc bus and its devices in SMMU

2018-08-31 Thread Nipun Gupta
Hi Joerg/Robin,

Can you please let me know when these patches will be applied onto the tree.
Is there anything else pending from my side.

Thanks,
Nipun

> -Original Message-
> From: Nipun Gupta
> Sent: Monday, July 9, 2018 4:48 PM
> To: robin.mur...@arm.com; will.dea...@arm.com; robh...@kernel.org;
> r...@kernel.org; mark.rutl...@arm.com; catalin.mari...@arm.com;
> gre...@linuxfoundation.org; Laurentiu Tudor ;
> bhelg...@google.com; h...@lst.de
> Cc: j...@8bytes.org; m.szyprow...@samsung.com; shawn...@kernel.org;
> frowand.l...@gmail.com; iommu@lists.linux-foundation.org; linux-
> ker...@vger.kernel.org; devicet...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; linuxppc-...@lists.ozlabs.org; linux-
> p...@vger.kernel.org; Bharat Bhushan ;
> stuyo...@gmail.com; Leo Li ; Nipun Gupta
> 
> Subject: [PATCH 0/7 v6] Support for fsl-mc bus and its devices in SMMU
> 
> This patchset defines IOMMU DT binding for fsl-mc bus and adds
> support in SMMU for fsl-mc bus.
> 
> The patch series is based on top of dma-mapping tree (for-next branch):
> http://git.infradead.org/users/hch/dma-mapping.git
> 
> These patches
>   - Define property 'iommu-map' for fsl-mc bus (patch 1)
>   - Integrates the fsl-mc bus with the SMMU using this
> IOMMU binding (patch 2,3,4)
>   - Adds the dma configuration support for fsl-mc bus (patch 5, 6)
>   - Updates the fsl-mc device node with iommu/dma related changes (patch 7)
> 
> Changes in v2:
>   - use iommu-map property for fsl-mc bus
>   - rebase over patchset https://patchwork.kernel.org/patch/10317337/
> and make corresponding changes for dma configuration of devices on
> fsl-mc bus
> 
> Changes in v3:
>   - move of_map_rid in drivers/of/address.c
> 
> Changes in v4:
>   - move of_map_rid in drivers/of/base.c
> 
> Changes in v5:
>   - break patch 5 in two separate patches (now patch 5/7 and patch 6/7)
>   - add changelog text in patch 3/7 and patch 5/7
>   - typo fix
> 
> Changes in v6:
>   - Updated fsl_mc_device_group() API to be more rational
>   - Added dma-coherent property in the LS2 smmu device node
>   - Minor fixes in the device-tree documentation
> 
> Nipun Gupta (7):
>   Documentation: fsl-mc: add iommu-map device-tree binding for fsl-mc
> bus
>   iommu/of: make of_pci_map_rid() available for other devices too
>   iommu/of: support iommu configuration for fsl-mc devices
>   iommu/arm-smmu: Add support for the fsl-mc bus
>   bus: fsl-mc: support dma configure for devices on fsl-mc bus
>   bus: fsl-mc: set coherent dma mask for devices on fsl-mc bus
>   arm64: dts: ls208xa: comply with the iommu map binding for fsl_mc
> 
>  .../devicetree/bindings/misc/fsl,qoriq-mc.txt  |  39 
>  arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi |   7 +-
>  drivers/bus/fsl-mc/fsl-mc-bus.c|  16 +++-
>  drivers/iommu/arm-smmu.c   |   7 ++
>  drivers/iommu/iommu.c  |  13 +++
>  drivers/iommu/of_iommu.c   |  25 -
>  drivers/of/base.c  | 102 
> +
>  drivers/of/irq.c   |   5 +-
>  drivers/pci/of.c   | 101 
>  include/linux/fsl/mc.h |   8 ++
>  include/linux/iommu.h  |   2 +
>  include/linux/of.h |  11 +++
>  include/linux/of_pci.h |  10 --
>  13 files changed, 224 insertions(+), 122 deletions(-)
> 
> --
> 1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] of/platform: initialise AMBA default DMA masks

2018-08-31 Thread Russell King - ARM Linux
On Fri, Aug 31, 2018 at 10:26:14AM +0200, Linus Walleij wrote:
> This addresses a v4.19-rc1 regression in the PL111 DRM driver
> in drivers/gpu/pl111/*
> 
> The driver uses the CMA KMS helpers and will thus at some
> point call down to dma_alloc_attrs() to allocate a chunk
> of contigous DMA memory for the framebuffer.
> 
> It appears that in v4.18, it was OK that this (and other
> DMA mastering AMBA devices) left dev->coherent_dma_mask
> blank (zero).
> 
> In v4.19-rc1 the WARN_ON_ONCE(dev && !dev->coherent_dma_mask)
> in dma_alloc_attrs() in include/linux/dma-mapping.h is
> triggered. The allocation later fails when get_coherent_dma_mask()
> is called from __dma_alloc() and __dma_alloc() returns
> NULL:
> 
> drm-clcd-pl111 dev:20: coherent DMA mask is unset
> drm-clcd-pl111 dev:20: [drm:drm_fb_helper_fbdev_setup] *ERROR*
>   Failed to set fbdev configuration
> 
> I have not been able to properly bisect down to the actual
> committ causing it beacuse the kernel simply fails to build
> at to many bisection points, pushing the bisect back to places
> like the merge of entire subsystem trees, where things have
> likely been resolved while merging them.
> 
> I found that Robin Murphy solved a similar problem in
> a5516219b102 ("of/platform: Initialise default DMA masks")
> by simply assigning dev.coherent_dma_mask and the
> dev.dma_mask to point to the same when creating devices
> from the device tree, and introducing the same code into
> the code path creating AMBA/PrimeCell devices solved my
> problem, graphics now come up.
> 
> The code simply assumes that the device can access all
> of the system memory by setting the coherent DMA mask
> to 0x when creating a device from the device
> tree, which is crude, but seems to be what kernel v4.18
> assumed.
> 
> Cc: Russell King 
> Cc: Christoph Hellwig 
> Cc: Robin Murphy 
> Signed-off-by: Linus Walleij 
> ---
> Russell, Christoph: this is probably not the last patch.
> But it has a more accurate description of the problem.
> I still do not know what change to the kernel made
> it start triggering this, whether in the DMA API or
> in DRM :(

Doing some research, it seems the warning was added in v4.16-rc1, and
it is only a warning - it doesn't otherwise change the behaviour, so
it's not the actual warning that's the problem.

I can't see anything in the DMA API nor DRM which would cause this
either, but there are changes in the DT code.

However, there are changes to of_dma_configure() which will be called
just before the driver's probe function is called - and I guess the
culpret is 4d8bde883bfb ("OF: Don't set default coherent DMA mask").
So I guess that's the root cause of the problem, and indeed this
change was made in 4.19-rc1.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch v15 4/5] dt-bindings: arm-smmu: Add bindings for qcom,smmu-v2

2018-08-31 Thread Vivek Gautam

Hi Rob,


On 8/30/2018 6:13 AM, Rob Herring wrote:

On Wed, Aug 29, 2018 at 6:23 AM Vivek Gautam
 wrote:

On Wed, Aug 29, 2018 at 2:05 PM Vivek Gautam
 wrote:

Hi Rob,


On 8/29/2018 2:04 AM, Rob Herring wrote:

On Mon, Aug 27, 2018 at 04:25:50PM +0530, Vivek Gautam wrote:

Add bindings doc for Qcom's smmu-v2 implementation.

Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
---

Changes since v14:
   - This is a new patch added in v15 after noticing the new
 checkpatch warning for separate dt-bindings doc.
   - This patch also addresses comments given by Rob and Robin to add
 a list of valid values of '' in "qcom,-smmu-v2"
 compatible string.

   .../devicetree/bindings/iommu/arm,smmu.txt | 47 
++
   1 file changed, 47 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 8a6ffce12af5..52198a539606 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,10 +17,24 @@ conditions.
   "arm,mmu-401"
   "arm,mmu-500"
   "cavium,smmu-v2"
+"qcom,-smmu-v2", "qcom,smmu-v2"

The v2 in the compatible string is kind of redundant unless the SoC has
other SMMU types.

sdm845 has smmu-v2, and smmu-500 [1].


 depending on the particular implementation and/or the
 version of the architecture implemented.

+  A number of Qcom SoCs use qcom,smmu-v2 version of the IP.
+  "qcom,-smmu-v2" represents a soc specific compatible
+  string that should be present along with the "qcom,smmu-v2"
+  to facilitate SoC specific clocks/power connections and to
+  address specific bug fixes.
+  '' string in "qcom,-smmu-v2" should be one of the
+  following:
+  msm8996 - for msm8996 Qcom SoC.
+  sdm845 - for sdm845 Qcom Soc.

Rather than all this prose, it would be simpler to just add 2 lines with
the full compatibles rather than . The  thing is not going to
work when/if we move bindings to json-schema also.

then we keep adding
   "qcom,msm8996-smmu-v2", "qcom,smmu-v2"
   "qcom,msm8998-smmu-v2", "qcom,smmu-v2"
   "qcom,sdm845-smmu-v2", "qcom,smmu-v2",
and from [1]
   "qcom,sdm845-smmu-500", "arm,mmu-500", etc.
for each SoCs?

How about following diff on top of this patch?

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 52198a539606..5e6c04876533 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,23 +17,18 @@ conditions.
  "arm,mmu-401"
  "arm,mmu-500"
  "cavium,smmu-v2"
-"qcom,-smmu-v2", "qcom,smmu-v2"
+"qcom,smmu-v2"

depending on the particular implementation and/or the
version of the architecture implemented.

-  A number of Qcom SoCs use qcom,smmu-v2 version of the IP.
-  "qcom,-smmu-v2" represents a soc specific compatible
-  string that should be present along with the "qcom,smmu-v2"
-  to facilitate SoC specific clocks/power connections and to
-  address specific bug fixes.
-  '' string in "qcom,-smmu-v2" should be one of the
-  following:
-  msm8996 - for msm8996 Qcom SoC.
-  sdm845 - for sdm845 Qcom Soc.
-
-  An example string would be -
-  "qcom,msm8996-smmu-v2", "qcom,smmu-v2".
+  Qcom SoCs using qcom,smmu-v2 must have soc specific
+  compatible string attached to "qcom,smmu-v2" to take care
+  of SoC specific clocks/power connections and to address
+  specific bug fixes.
+  Precisely, it should be one of the following:
+  "qcom,msm8996-smmu-v2", "qcom,smmu-v2",
+  "qcom,sdm845-smmu-v2", "qcom,smmu-v2".

We don't need an explanation of why we need specific compatibles in
each binding document (though maybe we need a better explanation
somewhere). We just need to know what are valid values for compatibles
and this includes any combinations. Generally, this is just a list of
combinations.

[snip]

Fixed this in v16. Thanks.

Best regards
Vivek
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] of/platform: initialise AMBA default DMA masks

2018-08-31 Thread Linus Walleij
This addresses a v4.19-rc1 regression in the PL111 DRM driver
in drivers/gpu/pl111/*

The driver uses the CMA KMS helpers and will thus at some
point call down to dma_alloc_attrs() to allocate a chunk
of contigous DMA memory for the framebuffer.

It appears that in v4.18, it was OK that this (and other
DMA mastering AMBA devices) left dev->coherent_dma_mask
blank (zero).

In v4.19-rc1 the WARN_ON_ONCE(dev && !dev->coherent_dma_mask)
in dma_alloc_attrs() in include/linux/dma-mapping.h is
triggered. The allocation later fails when get_coherent_dma_mask()
is called from __dma_alloc() and __dma_alloc() returns
NULL:

drm-clcd-pl111 dev:20: coherent DMA mask is unset
drm-clcd-pl111 dev:20: [drm:drm_fb_helper_fbdev_setup] *ERROR*
Failed to set fbdev configuration

I have not been able to properly bisect down to the actual
committ causing it beacuse the kernel simply fails to build
at to many bisection points, pushing the bisect back to places
like the merge of entire subsystem trees, where things have
likely been resolved while merging them.

I found that Robin Murphy solved a similar problem in
a5516219b102 ("of/platform: Initialise default DMA masks")
by simply assigning dev.coherent_dma_mask and the
dev.dma_mask to point to the same when creating devices
from the device tree, and introducing the same code into
the code path creating AMBA/PrimeCell devices solved my
problem, graphics now come up.

The code simply assumes that the device can access all
of the system memory by setting the coherent DMA mask
to 0x when creating a device from the device
tree, which is crude, but seems to be what kernel v4.18
assumed.

Cc: Russell King 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Signed-off-by: Linus Walleij 
---
Russell, Christoph: this is probably not the last patch.
But it has a more accurate description of the problem.
I still do not know what change to the kernel made
it start triggering this, whether in the DMA API or
in DRM :(

I am looking into Russell's suggestion to use the DMA
ranges, and thusly patch all device trees with DMA
mastering devices adding DMA ranges to them and parsing
that in the device tree AMBA/PrimeCell population code.
Possibly that is the right path forward.
---
 drivers/of/platform.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 7ba90c290a42..6c59673933e9 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -241,6 +241,10 @@ static struct amba_device *of_amba_device_create(struct 
device_node *node,
if (!dev)
goto err_clear_flag;
 
+   /* AMBA devices only support a single DMA mask */
+   dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+   dev->dev.dma_mask = &dev->dev.coherent_dma_mask;
+
/* setup generic device info */
dev->dev.of_node = of_node_get(node);
dev->dev.fwnode = &node->fwnode;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5 V6] x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to adjust encryption mask

2018-08-31 Thread Lianbo Jiang
For kdump kernel, when SME is enabled, the acpi table and dmi table will need
to be remapped without the memory encryption mask. So we have to strengthen
the logic in early_memremap_pgprot_adjust(), which makes us have an opportunity
to adjust the memory encryption mask.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/mm/ioremap.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index e01e6c695add..f9d9a39955f3 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -689,8 +689,15 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
encrypted_prot = true;
 
if (sme_active()) {
+/*
+ * In kdump kernel, the acpi table and dmi table will need
+ * to be remapped without the memory encryption mask. Here
+ * we have to strengthen the logic to adjust the memory
+ * encryption mask.
+ */
if (early_memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size))
+   memremap_is_efi_data(phys_addr, size) ||
+   is_kdump_kernel())
encrypted_prot = false;
}
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5 V6] kexec: allocate unencrypted control pages for kdump in case SME is enabled

2018-08-31 Thread Lianbo Jiang
When SME is enabled in the first kernel, we will allocate unencrypted pages
for kdump in order to be able to boot the kdump kernel like kexec.

Signed-off-by: Lianbo Jiang 
---
 kernel/kexec_core.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 23a83a4da38a..e7efcd1a977b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,16 @@ static struct page 
*kimage_alloc_crash_control_pages(struct kimage *image,
}
}
 
+   if (pages) {
+   /*
+* For kdump, we need to ensure that these pages are
+* unencrypted pages if SME is enabled.
+* By the way, it is unnecessary to call the arch_
+* kexec_pre_free_pages(), which will make the code
+* become more simple.
+*/
+   arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+   }
return pages;
 }
 
@@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result  = -ENOMEM;
goto out;
}
+   arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+   arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5 V6] kdump/vmcore: support encrypted old memory with SME enabled

2018-08-31 Thread Lianbo Jiang
In kdump kernel, we need to dump the old memory into vmcore file,if SME
is enabled in the first kernel, we have to remap the old memory with the
memory encryption mask, which will be automatically decrypted when we
read from DRAM.

For SME kdump, there are two cases that doesn't support:

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

1. SME is enabled in the first kernel, but SME is disabled in kdump kernel
In this case, because the old memory is encrypted, we can't decrypt the
old memory.

2. SME is disabled in the first kernel, but SME is enabled in kdump kernel
On the one hand, the old memory is unencrypted, the old memory can be dumped
as usual, we don't need to enable SME in kdump kernel; On the other hand, it
will increase the complexity of the code, we will have to consider how to
pass the SME flag from the first kernel to the kdump kernel, it is really
too expensive to do this.

This patches are only for SME kdump, the patches don't support SEV kdump.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 fs/proc/vmcore.c | 21 +++
 include/linux/crash_dump.h   | 12 +++
 4 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8824d01c0c35..dfbeae0e35ce 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_KEXEC_CORE)  += machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_KEXEC_FILE)   += kexec-bzimage64.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump_$(BITS).o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += crash_dump_encrypt.o
 obj-y  += kprobes/
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
diff --git a/arch/x86/kernel/crash_dump_encrypt.c 
b/arch/x86/kernel/crash_dump_encrypt.c
new file mode 100644
index ..e1b1a577f197
--- /dev/null
+++ b/arch/x86/kernel/crash_dump_encrypt.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory preserving reboot related code.
+ *
+ * Created by: Lianbo Jiang (liji...@redhat.com)
+ * Copyright (C) RedHat Corporation, 2018. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted"
+ * @pfn: page frame number to be copied
+ * @buf: target memory address for the copy; this can be in kernel address
+ * space or user address space (see @userbuf)
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page (based on pfn) to begin the copy
+ * @userbuf: if set, @buf is in user address space, use copy_to_user(),
+ * otherwise @buf is in kernel address space, use memcpy().
+ *
+ * Copy a page from "oldmem encrypted". For this page, there is no pte
+ * mapped in the current kernel. We stitch up a pte, similar to
+ * kmap_atomic.
+ */
+
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+   size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT,
+ PAGE_SIZE);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+   iounmap((void __iomem *)vaddr);
+   return -EFAULT;
+   }
+   } else
+   memcpy(buf, vaddr + offset, csize);
+
+   set_iounmap_nonlazy();
+   iounmap((void __iomem *)vaddr);
+   return csize;
+}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cbde728f8ac6..3065c8bada6a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -25,6 +25,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "internal.h"
 
 /* List representing chunks of contiguous memory areas and their offsets in
@@ -98,7 +101,8 @@ static int pfn_is_ram(unsigned long pfn)
 
 /* Reads a page from the oldmem device from given offset. */
 static ssize_t read_from_oldmem(char *buf, size_t count,
-   u64 *ppos, int userbuf)
+   u64 *ppos, int userbuf,
+   bool encrypted)
 {
unsigned long pfn, offset;
size_t nr_bytes;
@@ -120,8

[PATCH 4/5 V6] iommu/amd_iommu: remap the device table of IOMMU with the memory encryption mask for kdump

2018-08-31 Thread Lianbo Jiang
In kdump kernel, it will copy the device table of IOMMU from the old device
table, which is encrypted when SME is enabled in the first kernel. So we
have to remap the old device table with the memory encryption mask.

Signed-off-by: Lianbo Jiang 
---
 drivers/iommu/amd_iommu_init.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 84b3e4445d46..3931c7de7c69 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -902,12 +902,22 @@ static bool copy_device_table(void)
}
}
 
-   old_devtb_phys = entry & PAGE_MASK;
+   /*
+* When SME is enabled in the first kernel, the entry includes the
+* memory encryption mask(sme_me_mask), we must remove the memory
+* encryption mask to obtain the true physical address in kdump kernel.
+*/
+   old_devtb_phys = __sme_clr(entry) & PAGE_MASK;
+
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
trustworthy!\n");
return false;
}
-   old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   old_devtb = (sme_active() && is_kdump_kernel())
+   ? (__force void *)ioremap_encrypted(old_devtb_phys,
+   dev_table_size)
+   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+
if (!old_devtb)
return false;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5 V6] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memroy

2018-08-31 Thread Lianbo Jiang
When SME is enabled on AMD machine, the memory is encrypted in the first
kernel. In this case, SME also needs to be enabled in kdump kernel, and
we have to remap the old memory with the memory encryption mask.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 25 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de64840dd22..f8795f9581c7 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545ec199..e01e6c695add 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wt);
 
+void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size)
+{
+   return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
+   __builtin_return_address(0), true);
+}
+EXPORT_SYMBOL(ioremap_encrypted);
+
 void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_cache);
 
@@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, 
unsigned long size,
 {
return __ioremap_caller(phys_addr, size,
pgprot2cachemode(__pgprot(prot_val)),
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_prot);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/5 V6] Support kdump for AMD secure memory encryption(SME)

2018-08-31 Thread Lianbo Jiang
When SME is enabled on AMD machine, we also need to support kdump. Because
the memory is encrypted in the first kernel, we will remap the old memory
to the kdump kernel for dumping data, and SME is also enabled in the kdump
kernel, otherwise the old memory can not be decrypted.

For the kdump, it is necessary to distinguish whether the memory is encrypted.
Furthermore, we should also know which part of the memory is encrypted or
decrypted. We will appropriately remap the memory according to the specific
situation in order to tell cpu how to access the memory.

As we know, a page of memory that is marked as encrypted, which will be
automatically decrypted when read from DRAM, and will also be automatically
encrypted when written to DRAM. If the old memory is encrypted, we have to
remap the old memory with the memory encryption mask, which will automatically
decrypt the old memory when we read those data.

For kdump(SME), there are two cases that doesn't support:

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

1. SME is enabled in the first kernel, but SME is disabled in kdump kernel
In this case, because the old memory is encrypted, we can't decrypt the
old memory.

2. SME is disabled in the first kernel, but SME is enabled in kdump kernel
It is unnecessary to support in this case, because the old memory is
unencrypted, the old memory can be dumped as usual, we don't need to enable
SME in kdump kernel. Another, If we must support the scenario, it will
increase the complexity of the code, we will have to consider how to pass
the SME flag from the first kernel to the kdump kernel, in order to let the
kdump kernel know that whether the old memory is encrypted.

There are two methods to pass the SME flag to the kdump kernel. The first
method is to modify the assembly code, which includes some common code and
the path is too long. The second method is to use kexec tool, which could
require the SME flag to be exported in the first kernel by "proc" or "sysfs",
kexec tools will read the SME flag from "proc" or "sysfs" when we use kexec
tools to load image, subsequently the SME flag will be saved in boot_params,
we can properly remap the old memory according to the previously saved SME
flag. But it is too expensive to do this.

This patches are only for SME kdump, the patches don't support SEV kdump.

Test tools:
makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
Note: This patch can only dump vmcore in the case of SME enabled.

crash-7.2.3: https://github.com/crash-utility/crash.git
commit 001f77a05585 (Fix for Linux 4.19-rc1 and later kernels that contain
 kernel commit7290d58095712a89f845e1bca05334796dd49ed2)

Test environment:
HP ProLiant DL385Gen10 AMD EPYC 7251
8-Core Processor
32768 MB memory
600 GB disk space

Linux 4.19-rc1:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit 5b394b2ddf0347bef56e50c69a58773c94343ff3

Reference:
AMD64 Architecture Programmer's Manual
https://support.amd.com/TechDocs/24593.pdf

Changes since v5:
1. modified patches log.

Some known issues:
1. about SME
Upstream kernel will hang on HP machine(DL385Gen10 AMD EPYC 7251) when
we execute the kexec command as follow:

# kexec -l /boot/vmlinuz-4.19.0-rc1+ --initrd=/boot/initramfs-4.19.0-rc1+.img 
--command-line="root=/dev/mapper/rhel_hp--dl385g10--03-root ro mem_encrypt=on 
rd.lvm.lv=rhel_hp-dl385g10-03/root rd.lvm.lv=rhel_hp-dl385g10-03/swap 
console=ttyS0,115200n81 LANG=en_US.UTF-8 earlyprintk=serial debug nokaslr"
# kexec -e (or reboot)

But this issue can not be reproduced on speedway machine, and this issue
is irrelevant to my posted patches.

The kernel log:
[ 1248.932239] kexec_core: Starting new kernel
early console in extract_kernel
input_data: 0x00087e91c3b4
input_len: 0x0067fcbd
output: 0x00087d40
output_len: 0x01b6fa90
kernel_total_size: 0x01a9d000
trampoline_32bit: 0x00099000

Decompressing Linux...
Parsing ELF...[---Here the system will hang]

2. about SEV
Upstream kernel(Host OS) doesn't work in host side, some drivers about
SEV always go wrong in host side. We can't boot SEV Guest OS to test
kdump patch. Maybe it is more reasonable to improve SEV in another
patch. When some drivers can work in host side and it can also boot
Virtual Machine(SEV Guest OS), it will be suitable to fix SEV for kdump.

[  369.426131] INFO: task systemd-udevd:865 blocked for more than 120 seconds.
[  369.433177]   Not tainted 4.17.0-rc5+ #60
[  369.437585] "echo 0 > /proc/sys/kernel/hung_t