Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-31 Thread Xunlei Pang
On 05/31/2017 at 01:46 AM, Tom Lendacky wrote:
> On 5/25/2017 11:17 PM, Xunlei Pang wrote:
>> On 04/19/2017 at 05:21 AM, Tom Lendacky wrote:
>>> Provide support so that kexec can be used to boot a kernel when SME is
>>> enabled.
>>>
>>> Support is needed to allocate pages for kexec without encryption.  This
>>> is needed in order to be able to reboot in the kernel in the same manner
>>> as originally booted.
>>
>> Hi Tom,
>>
>> Looks like kdump will break, I didn't see the similar handling for kdump 
>> cases, see kernel:
>>  kimage_alloc_crash_control_pages(), kimage_load_crash_segment(), etc. >
>> We need to support kdump with SME, kdump 
>> kernel/initramfs/purgatory/elfcorehdr/etc
>> are all loaded into the reserved memory(see crashkernel=X) by userspace 
>> kexec-tools.
>> I think a straightforward way would be to mark the whole reserved memory 
>> range without
>> encryption before loading all the kexec segments for kdump, I guess we can 
>> handle this
>> easily in arch_kexec_unprotect_crashkres().
>
> Yes, that would work.
>
>>
>> Moreover, now that "elfcorehdr=X" is left as decrypted, it needs to be 
>> remapped to the
>> encrypted data.
>
> This is an area that I'm not familiar with, so I don't completely
> understand the flow in regards to where/when/how the ELF headers are
> copied and what needs to be done.
>
> Can you elaborate a bit on this?

"elfcorehdr" is generated by userspace 
kexec-tools(git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git), 
it's
actually ELF CORE header data(elf header, PT_LOAD/PT_NOTE program header), see 
kexec/crashdump-elf.c::FUNC().

For kdump case, it will be put in some reserved crash memory allocated by 
kexec-tools, and passed the corresponding
start address of the allocated reserved crash memory to kdump kernel via 
"elfcorehdr=", please see kernel functions
setup_elfcorehdr() and vmcore_init() for how it is parsed by kdump kernel.

Regards,
Xunlei

>>
>>>
>>> Additionally, when shutting down all of the CPUs we need to be sure to
>>> flush the caches and then halt. This is needed when booting from a state
>>> where SME was not active into a state where SME is active (or vice-versa).
>>> Without these steps, it is possible for cache lines to exist for the same
>>> physical location but tagged both with and without the encryption bit. This
>>> can cause random memory corruption when caches are flushed depending on
>>> which cacheline is written last.
>>>
>>> Signed-off-by: Tom Lendacky <thomas.lenda...@amd.com>
>>> ---
>>>   arch/x86/include/asm/init.h  |1 +
>>>   arch/x86/include/asm/irqflags.h  |5 +
>>>   arch/x86/include/asm/kexec.h |8 
>>>   arch/x86/include/asm/pgtable_types.h |1 +
>>>   arch/x86/kernel/machine_kexec_64.c   |   35 
>>> +-
>>>   arch/x86/kernel/process.c|   26 +++--
>>>   arch/x86/mm/ident_map.c  |   11 +++
>>>   include/linux/kexec.h|   14 ++
>>>   kernel/kexec_core.c  |7 +++
>>>   9 files changed, 101 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
>>> index 737da62..b2ec511 100644
>>> --- a/arch/x86/include/asm/init.h
>>> +++ b/arch/x86/include/asm/init.h
>>> @@ -6,6 +6,7 @@ struct x86_mapping_info {
>>>   void *context; /* context for alloc_pgt_page */
>>>   unsigned long pmd_flag; /* page flag for PMD entry */
>>>   unsigned long offset; /* ident mapping offset */
>>> +unsigned long kernpg_flag; /* kernel pagetable flag override */
>>>   };
>>> int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t 
>>> *pgd_page,
>>> diff --git a/arch/x86/include/asm/irqflags.h 
>>> b/arch/x86/include/asm/irqflags.h
>>> index ac7692d..38b5920 100644
>>> --- a/arch/x86/include/asm/irqflags.h
>>> +++ b/arch/x86/include/asm/irqflags.h
>>> @@ -58,6 +58,11 @@ static inline __cpuidle void native_halt(void)
>>>   asm volatile("hlt": : :"memory");
>>>   }
>>>   +static inline __cpuidle void native_wbinvd_halt(void)
>>> +{
>>> +asm volatile("wbinvd; hlt" : : : "memory");
>>> +}
>>> +
>>>   #endif
>>> #

Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption

2017-05-25 Thread Xunlei Pang
On 05/26/2017 at 10:49 AM, Dave Young wrote:
> Ccing Xunlei he is reading the patches see what need to be done for
> kdump. There should still be several places to handle to make kdump work.
>
> On 05/18/17 at 07:01pm, Borislav Petkov wrote:
>> On Tue, Apr 18, 2017 at 04:22:12PM -0500, Tom Lendacky wrote:
>>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
>>> determine if SME is active.
>> But why do user-space tools need to know that?
>>
>> I mean, when we load the kdump kernel, we do it with the first kernel,
>> with the kexec_load() syscall, AFAICT. And that code does a lot of
>> things during that init, like machine_kexec_prepare()->init_pgtable() to
>> prepare the ident mapping of the second kernel, for example.
>>
>> What I'm aiming at is that the first kernel knows *exactly* whether SME
>> is enabled or not and doesn't need to tell the second one through some
>> sysfs entries - it can do that during loading.
>>
>> So I don't think we need any userspace things at all...
> If kdump kernel can get the SME status from hardware register then this
> should be not necessary and this patch can be dropped.

Yes, I also agree with dropping this one.

Regards,
Xunlei
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-25 Thread Xunlei Pang
On 04/19/2017 at 05:21 AM, Tom Lendacky wrote:
> Provide support so that kexec can be used to boot a kernel when SME is
> enabled.
>
> Support is needed to allocate pages for kexec without encryption.  This
> is needed in order to be able to reboot in the kernel in the same manner
> as originally booted.

Hi Tom,

Looks like kdump will break, I didn't see the similar handling for kdump cases, 
see kernel:
kimage_alloc_crash_control_pages(), kimage_load_crash_segment(), etc.

We need to support kdump with SME, kdump 
kernel/initramfs/purgatory/elfcorehdr/etc
are all loaded into the reserved memory(see crashkernel=X) by userspace 
kexec-tools.
I think a straightforward way would be to mark the whole reserved memory range 
without
encryption before loading all the kexec segments for kdump, I guess we can 
handle this
easily in arch_kexec_unprotect_crashkres().

Moreover, now that "elfcorehdr=X" is left as decrypted, it needs to be remapped 
to the
encrypted data.

Regards,
Xunlei

>
> Additionally, when shutting down all of the CPUs we need to be sure to
> flush the caches and then halt. This is needed when booting from a state
> where SME was not active into a state where SME is active (or vice-versa).
> Without these steps, it is possible for cache lines to exist for the same
> physical location but tagged both with and without the encryption bit. This
> can cause random memory corruption when caches are flushed depending on
> which cacheline is written last.
>
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/init.h  |1 +
>  arch/x86/include/asm/irqflags.h  |5 +
>  arch/x86/include/asm/kexec.h |8 
>  arch/x86/include/asm/pgtable_types.h |1 +
>  arch/x86/kernel/machine_kexec_64.c   |   35 
> +-
>  arch/x86/kernel/process.c|   26 +++--
>  arch/x86/mm/ident_map.c  |   11 +++
>  include/linux/kexec.h|   14 ++
>  kernel/kexec_core.c  |7 +++
>  9 files changed, 101 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 737da62..b2ec511 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -6,6 +6,7 @@ struct x86_mapping_info {
>   void *context;   /* context for alloc_pgt_page */
>   unsigned long pmd_flag;  /* page flag for PMD entry */
>   unsigned long offset;/* ident mapping offset */
> + unsigned long kernpg_flag;   /* kernel pagetable flag override */
>  };
>  
>  int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index ac7692d..38b5920 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -58,6 +58,11 @@ static inline __cpuidle void native_halt(void)
>   asm volatile("hlt": : :"memory");
>  }
>  
> +static inline __cpuidle void native_wbinvd_halt(void)
> +{
> + asm volatile("wbinvd; hlt" : : : "memory");
> +}
> +
>  #endif
>  
>  #ifdef CONFIG_PARAVIRT
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 70ef205..e8183ac 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -207,6 +207,14 @@ struct kexec_entry64_regs {
>   uint64_t r15;
>   uint64_t rip;
>  };
> +
> +extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
> +gfp_t gfp);
> +#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
> +
> +extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
> +#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
> +
>  #endif
>  
>  typedef void crash_vmclear_fn(void);
> diff --git a/arch/x86/include/asm/pgtable_types.h 
> b/arch/x86/include/asm/pgtable_types.h
> index ce8cb1c..0f326f4 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -213,6 +213,7 @@ enum page_cache_mode {
>  #define PAGE_KERNEL  __pgprot(__PAGE_KERNEL | _PAGE_ENC)
>  #define PAGE_KERNEL_RO   __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
>  #define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
> +#define PAGE_KERNEL_EXEC_NOENC   __pgprot(__PAGE_KERNEL_EXEC)
>  #define PAGE_KERNEL_RX   __pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
>  #define PAGE_KERNEL_NOCACHE  __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
>  #define PAGE_KERNEL_LARGE__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 085c3b3..11c0ca9 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, 
> pgd_t *pgd)
>  

[PATCH v3] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2016-12-05 Thread Xunlei Pang
We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
under kdump, it can be steadily reproduced on several different machines,
the dmesg log is like:
HP HPSA Driver (v 3.4.16-0)
hpsa :02:00.0: using doorbell to reset controller
hpsa :02:00.0: board ready after hard reset.
hpsa :02:00.0: Waiting for controller to respond to no-op
DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 06] 
PTE Read access is not set
hpsa :02:00.0: controller message 03:00 timed out
hpsa :02:00.0: no-op failed; re-trying

After some debugging, we found that the fault addr is from DMA initiated at
the driver probe stage after reset(not in-flight DMA), and the corresponding
pte entry value is correct, the fault is likely due to the old iommu caches
of the in-flight DMA before it.

Thus we need to flush the old cache after context mapping is setup for the
device, where the device is supposed to finish reset at its driver probe
stage and no in-flight DMA exists hereafter.

I'm not sure if the hardware is responsible for invalidating all the related
caches allocated in the iommu hardware before, but seems not the case for hpsa,
actually many device drivers have problems in properly resetting the hardware.
Anyway flushing (again) by software in kdump kernel when the device gets context
mapped which is a quite infrequent operation does little harm.

With this patch, the problematic machine can survive the kdump tests.

CC: Myron Stowe <myron.st...@gmail.com>
CC: Joseph Szczypek <jszcz...@redhat.com>
CC: Don Brace <don.br...@microsemi.com>
CC: Baoquan He <b...@redhat.com>
CC: Dave Young <dyo...@redhat.com>
Fixes: 091d42e43d21 ("iommu/vt-d: Copy translation tables from old kernel")
Fixes: dbcd861f252d ("iommu/vt-d: Do not re-use domain-ids from the old kernel")
Fixes: cf484d0e6939 ("iommu/vt-d: Mark copied context entries")
Signed-off-by: Xunlei Pang <xlp...@redhat.com>
---
v2->v3:
Flush context cache only and add Fixes-tag, according to Joerg's comments.

 drivers/iommu/intel-iommu.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3965e73..624eac9 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2024,6 +2024,25 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
if (context_present(context))
goto out_unlock;
 
+   /*
+* For kdump cases, old valid entries may be cached due to the
+* in-flight DMA and copied pgtable, but there is no unmapping
+* behaviour for them, thus we need an explicit cache flush for
+* the newly-mapped device. For kdump, at this point, the device
+* is supposed to finish reset at its driver probe stage, so no
+* in-flight DMA will exist, and we don't need to worry anymore
+* hereafter.
+*/
+   if (context_copied(context)) {
+   u16 did_old = context_domain_id(context);
+
+   if (did_old >= 0 && did_old < cap_ndoms(iommu->cap))
+   iommu->flush.flush_context(iommu, did_old,
+  (((u16)bus) << 8) | devfn,
+  DMA_CCMD_MASK_NOBIT,
+  DMA_CCMD_DEVICE_INVL);
+   }
+
pgd = domain->pgd;
 
context_clear_entry(context);
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

2016-12-01 Thread Xunlei Pang
On 12/01/2016 at 06:33 PM, Joerg Roedel wrote:
> On Thu, Dec 01, 2016 at 10:15:45AM +0800, Xunlei Pang wrote:
>> index 3965e73..624eac9 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -2024,6 +2024,25 @@ static int domain_context_mapping_one(struct 
>> dmar_domain *domain,
>> if (context_present(context))
>> goto out_unlock;
>>  
>> +   /*
>> +* For kdump cases, old valid entries may be cached due to the
>> +* in-flight DMA and copied pgtable, but there is no unmapping
>> +* behaviour for them, thus we need an explicit cache flush for
>> +* the newly-mapped device. For kdump, at this point, the device
>> +* is supposed to finish reset at its driver probe stage, so no
>> +* in-flight DMA will exist, and we don't need to worry anymore
>> +* hereafter.
>> +*/
>> +   if (context_copied(context)) {
>> +   u16 did_old = context_domain_id(context);
>> +
>> +   if (did_old >= 0 && did_old < cap_ndoms(iommu->cap))
>> +   iommu->flush.flush_context(iommu, did_old,
>> +  (((u16)bus) << 8) | devfn,
>> +  DMA_CCMD_MASK_NOBIT,
>> +  DMA_CCMD_DEVICE_INVL);
>> +   }
>> +
>> pgd = domain->pgd;
> Yes, this looks better. Have you tested it the same way as the old
> patch?

Yes, I have tested and it works, will send v3 later.

Regards,
Xunlei
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

2016-11-30 Thread Xunlei Pang
On 11/30/2016 at 10:26 PM, Joerg Roedel wrote:
> On Wed, Nov 30, 2016 at 06:23:34PM +0800, Baoquan He wrote:
>> OK, talked with Xunlei. The old cache could be entry with present bit
>> set.
> -EPARSE
>
> Anyway, what I was trying to say is, that the IOMMU TLB is tagged with
> domain-ids, and that there is also a context-cache which maps device-ids
> to domain-ids.
>
> If we update the context entry then we need to flush only the context
> entry, as it will point to a new domain-id then and future IOTLB lookups
> in the IOMMU will be using the new domain-id and do not match the old
> entries.

Hi Joerg,

Thanks for the explanation, and we still need to flush context cache using old 
domain-id, right?
How about the following update?

index 3965e73..624eac9 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2024,6 +2024,25 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
if (context_present(context))
goto out_unlock;
 
+   /*
+* For kdump cases, old valid entries may be cached due to the
+* in-flight DMA and copied pgtable, but there is no unmapping
+* behaviour for them, thus we need an explicit cache flush for
+* the newly-mapped device. For kdump, at this point, the device
+* is supposed to finish reset at its driver probe stage, so no
+* in-flight DMA will exist, and we don't need to worry anymore
+* hereafter.
+*/
+   if (context_copied(context)) {
+   u16 did_old = context_domain_id(context);
+
+   if (did_old >= 0 && did_old < cap_ndoms(iommu->cap))
+   iommu->flush.flush_context(iommu, did_old,
+  (((u16)bus) << 8) | devfn,
+  DMA_CCMD_MASK_NOBIT,
+  DMA_CCMD_DEVICE_INVL);
+   }
+
pgd = domain->pgd;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

2016-11-30 Thread Xunlei Pang
On 11/29/2016 at 10:35 PM, Joerg Roedel wrote:
> On Thu, Nov 17, 2016 at 10:47:28AM +0800, Xunlei Pang wrote:
>> As per the comment, the code here only needs to flush context caches
>> for the special domain 0 which is used to tag the
>> non-present/erroneous caches, seems we should flush the old domain id
>> of present entries for kdump according to the analysis, other than the
>> new-allocated domain id. Let me ponder more on this.
> Flushing the context entry only is fine. The old domain-id will not be
> re-used anyway, so there is no point in reading it out of the context
> table and flush it.

Do you mean to flush the context entry using the new-allocated domain id?

Yes, old domain-id will not be re-used as they were reserved when copy, but
may still be cached by in-flight DMA access.

Here is what the things seem to be from my understanding, and why I want to
flush using the old domain id:
1) In kdump mode, old tables are copied, and all the iommu caches are flushed.
2) There comes some in-flight DMA before the device's new context is mapped,
so translation caches(context, iotlb, etc) are created tagging old domain-id
in the iommu hardware.
3) At the driver probe stage, the device is reset , and no in-flight DMA will 
exist.
Here I assumed that the device reset won't flush the old caches in the iommu
hardware related to this device. I haven't found any relevant 
specification, please
correct me if I am wrong.
4) Then new context is setup, and new DMA is initiated, hit old cache that was
created in 2) as currently there's no such flush action, so DMAR fault 
happens.

I already posted v2 to flush context/iotlb using the old domain-id:
https://lkml.org/lkml/2016/11/18/514

Regards,
Xunlei

>
> Also, please add a Fixes-tag when you re-post this patch.
>
>
>   Joerg
>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2016-11-27 Thread Xunlei Pang
Ping Joerg/David, do you have any comment on it?
On 2016/11/19 at 00:23, Xunlei Pang wrote:
> We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
> under kdump, it can be steadily reproduced on several different machines,
> the dmesg log is like(running on 4.9.0-rc5+):
> HP HPSA Driver (v 3.4.16-0)
> hpsa :02:00.0: using doorbell to reset controller
> hpsa :02:00.0: board ready after hard reset.
> hpsa :02:00.0: Waiting for controller to respond to no-op
> DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
> DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
> DMAR: DRHD: handling fault status reg 2
> DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 
> 06] PTE Read access is not set
> hpsa :02:00.0: controller message 03:00 timed out
> hpsa :02:00.0: no-op failed; re-trying
>
> After some debugging, we found that the fault addr is from DMA initiated at
> the driver probe stage after reset(not in-flight DMA), and the corresponding
> pte entry value is correct, the fault is likely due to the old iommu caches
> of the in-flight DMA before it.
>
> Thus we need to flush the old cache after context mapping is setup for the
> device, where the device is supposed to finish reset at its driver probe
> stage and no in-flight DMA exists hereafter.
>
> I'm not sure if the hardware is responsible for invalidating all the related
> caches allocated in iommu hardware during reset, but seems not the case for 
> hpsa,
> actually many device drivers even have problems properly resetting the 
> hardware.
> Anyway flushing (again) by software in kdump mode when the device gets context
> mapped which is a quite infrequent operation does little harm.
>
> With this patch, the problematic machine can survive the kdump tests.
>
> CC: Myron Stowe <myron.st...@redhat.com>
> CC: Joseph Szczypek <jszcz...@redhat.com>
> CC: Don Brace <don.br...@microsemi.com>
> CC: Baoquan He <b...@redhat.com>
> CC: Dave Young <dyo...@redhat.com>
> Signed-off-by: Xunlei Pang <xlp...@redhat.com>
> ---
> v1 -> v2:
> Flush caches using old domain id.
>
>  drivers/iommu/intel-iommu.c | 22 ++
>  1 file changed, 22 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3965e73..653304d 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2024,6 +2024,28 @@ static int domain_context_mapping_one(struct 
> dmar_domain *domain,
>   if (context_present(context))
>   goto out_unlock;
>  
> + /*
> +  * For kdump cases, old valid entries may be cached due to the
> +  * in-flight DMA and copied pgtable, but there is no unmapping
> +  * behaviour for them, thus we need an explicit cache flush for
> +  * the newly-mapped device. For kdump, at this point, the device
> +  * is supposed to finish reset at its driver probe stage, so no
> +  * in-flight DMA will exist, and we don't need to worry anymore
> +  * hereafter.
> +  */
> + if (context_copied(context)) {
> + u16 did_old = context_domain_id(context);
> +
> + if (did_old >= 0 && did_old < cap_ndoms(iommu->cap)) {
> + iommu->flush.flush_context(iommu, did_old,
> +(((u16)bus) << 8) | devfn,
> +DMA_CCMD_MASK_NOBIT,
> +DMA_CCMD_DEVICE_INVL);
> + iommu->flush.flush_iotlb(iommu, did_old, 0, 0,
> +DMA_TLB_DSI_FLUSH);
> + }
> + }
> +
>   pgd = domain->pgd;
>  
>   context_clear_entry(context);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped

2016-11-18 Thread Xunlei Pang
We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
under kdump, it can be steadily reproduced on several different machines,
the dmesg log is like(running on 4.9.0-rc5+):
HP HPSA Driver (v 3.4.16-0)
hpsa :02:00.0: using doorbell to reset controller
hpsa :02:00.0: board ready after hard reset.
hpsa :02:00.0: Waiting for controller to respond to no-op
DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 06] 
PTE Read access is not set
hpsa :02:00.0: controller message 03:00 timed out
hpsa :02:00.0: no-op failed; re-trying

After some debugging, we found that the fault addr is from DMA initiated at
the driver probe stage after reset(not in-flight DMA), and the corresponding
pte entry value is correct, the fault is likely due to the old iommu caches
of the in-flight DMA before it.

Thus we need to flush the old cache after context mapping is setup for the
device, where the device is supposed to finish reset at its driver probe
stage and no in-flight DMA exists hereafter.

I'm not sure if the hardware is responsible for invalidating all the related
caches allocated in iommu hardware during reset, but seems not the case for 
hpsa,
actually many device drivers even have problems properly resetting the hardware.
Anyway flushing (again) by software in kdump mode when the device gets context
mapped which is a quite infrequent operation does little harm.

With this patch, the problematic machine can survive the kdump tests.

CC: Myron Stowe <myron.st...@redhat.com>
CC: Joseph Szczypek <jszcz...@redhat.com>
CC: Don Brace <don.br...@microsemi.com>
CC: Baoquan He <b...@redhat.com>
CC: Dave Young <dyo...@redhat.com>
Signed-off-by: Xunlei Pang <xlp...@redhat.com>
---
v1 -> v2:
Flush caches using old domain id.

 drivers/iommu/intel-iommu.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3965e73..653304d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2024,6 +2024,28 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
if (context_present(context))
goto out_unlock;
 
+   /*
+* For kdump cases, old valid entries may be cached due to the
+* in-flight DMA and copied pgtable, but there is no unmapping
+* behaviour for them, thus we need an explicit cache flush for
+* the newly-mapped device. For kdump, at this point, the device
+* is supposed to finish reset at its driver probe stage, so no
+* in-flight DMA will exist, and we don't need to worry anymore
+* hereafter.
+*/
+   if (context_copied(context)) {
+   u16 did_old = context_domain_id(context);
+
+   if (did_old >= 0 && did_old < cap_ndoms(iommu->cap)) {
+   iommu->flush.flush_context(iommu, did_old,
+  (((u16)bus) << 8) | devfn,
+  DMA_CCMD_MASK_NOBIT,
+  DMA_CCMD_DEVICE_INVL);
+   iommu->flush.flush_iotlb(iommu, did_old, 0, 0,
+  DMA_TLB_DSI_FLUSH);
+   }
+   }
+
pgd = domain->pgd;
 
context_clear_entry(context);
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

2016-11-16 Thread Xunlei Pang
On 2016/11/16 at 22:58, Myron Stowe wrote:
> On Wed, Nov 16, 2016 at 2:13 AM, Xunlei Pang <xp...@redhat.com> wrote:
>> Ccing David
>> On 2016/11/16 at 17:02, Xunlei Pang wrote:
>>> We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
>>> under kdump, it can be steadily reproduced on several different machines,
>>> the dmesg log is like:
>>> HP HPSA Driver (v 3.4.16-0)
>>> hpsa :02:00.0: using doorbell to reset controller
>>> hpsa :02:00.0: board ready after hard reset.
>>> hpsa :02:00.0: Waiting for controller to respond to no-op
>>> DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
>>> DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
>>> DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
>>> DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
>>> DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
>>> DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
>>> DMAR: DRHD: handling fault status reg 2
>>> DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 
>>> 06] PTE Read access is not set
>>> hpsa :02:00.0: controller message 03:00 timed out
>>> hpsa :02:00.0: no-op failed; re-trying
>>>
>>> After some debugging, we found that the corresponding pte entry value
>>> is correct, and the value of the iommu caching mode is 0, the fault is
>>> probably due to the old iotlb cache of the in-flight DMA.
>>>
>>> Thus need to flush the old iotlb after context mapping is setup for the
>>> device, where the device is supposed to finish reset at its driver probe
>>> stage and no in-flight DMA exists hereafter.
>>>
>>> With this patch, all our problematic machines can survive the kdump tests.
>>>
>>> CC: Myron Stowe <myron.st...@redhat.com>
>>> CC: Don Brace <don.br...@microsemi.com>
>>> CC: Baoquan He <b...@redhat.com>
>>> CC: Dave Young <dyo...@redhat.com>
>>> Tested-by: Joseph Szczypek <jszcz...@redhat.com>
>>> Signed-off-by: Xunlei Pang <xlp...@redhat.com>
>>> ---
>>>  drivers/iommu/intel-iommu.c | 11 +--
>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>> index 3965e73..eb79288 100644
>>> --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -2067,9 +2067,16 @@ static int domain_context_mapping_one(struct 
>>> dmar_domain *domain,
>>>* It's a non-present to present mapping. If hardware doesn't cache
>>>* non-present entry we only need to flush the write-buffer. If the
>>>* _does_ cache non-present entries, then it does so in the special
> If this does get accepted then we should fix the above grammar also -
>   "If the _does_ cache ..." -> "If the hardware _does_ cache ..."

Yes, but this reminds me of something.
As per the comment, the code here only needs to flush context caches for the 
special domain 0 which is
used to tag the non-present/erroneous caches, seems we should flush the old 
domain id of present entries
for kdump according to the analysis, other than the new-allocated domain id. 
Let me ponder more on this.

Regards,
Xunlei

>
>>> -  * domain #0, which we have to flush:
>>> +  * domain #0, which we have to flush.
>>> +  *
>>> +  * For kdump cases, present entries may be cached due to the in-flight
>>> +  * DMA and copied old pgtable, but there is no unmapping behaviour for
>>> +  * them, so we need an explicit iotlb flush for the newly-mapped 
>>> device.
>>> +  * For kdump, at this point, the device is supposed to finish reset at
>>> +  * the driver probe stage, no in-flight DMA will exist, thus we do not
>>> +  * need to worry about that anymore hereafter.
>>>*/
>>> - if (cap_caching_mode(iommu->cap)) {
>>> + if (is_kdump_kernel() || cap_caching_mode(iommu->cap)) {
>>>   iommu->flush.flush_context(iommu, 0,
>>>  (((u16)bus) << 8) | devfn,
>>>  DMA_CCMD_MASK_NOBIT,
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

2016-11-16 Thread Xunlei Pang
Ccing David
On 2016/11/16 at 17:02, Xunlei Pang wrote:
> We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
> under kdump, it can be steadily reproduced on several different machines,
> the dmesg log is like:
> HP HPSA Driver (v 3.4.16-0)
> hpsa :02:00.0: using doorbell to reset controller
> hpsa :02:00.0: board ready after hard reset.
> hpsa :02:00.0: Waiting for controller to respond to no-op
> DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
> DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
> DMAR: DRHD: handling fault status reg 2
> DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 
> 06] PTE Read access is not set
> hpsa :02:00.0: controller message 03:00 timed out
> hpsa :02:00.0: no-op failed; re-trying
>
> After some debugging, we found that the corresponding pte entry value
> is correct, and the value of the iommu caching mode is 0, the fault is
> probably due to the old iotlb cache of the in-flight DMA.
>
> Thus need to flush the old iotlb after context mapping is setup for the
> device, where the device is supposed to finish reset at its driver probe
> stage and no in-flight DMA exists hereafter.
>
> With this patch, all our problematic machines can survive the kdump tests.
>
> CC: Myron Stowe <myron.st...@redhat.com>
> CC: Don Brace <don.br...@microsemi.com>
> CC: Baoquan He <b...@redhat.com>
> CC: Dave Young <dyo...@redhat.com>
> Tested-by: Joseph Szczypek <jszcz...@redhat.com>
> Signed-off-by: Xunlei Pang <xlp...@redhat.com>
> ---
>  drivers/iommu/intel-iommu.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3965e73..eb79288 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2067,9 +2067,16 @@ static int domain_context_mapping_one(struct 
> dmar_domain *domain,
>* It's a non-present to present mapping. If hardware doesn't cache
>* non-present entry we only need to flush the write-buffer. If the
>* _does_ cache non-present entries, then it does so in the special
> -  * domain #0, which we have to flush:
> +  * domain #0, which we have to flush.
> +  *
> +  * For kdump cases, present entries may be cached due to the in-flight
> +  * DMA and copied old pgtable, but there is no unmapping behaviour for
> +  * them, so we need an explicit iotlb flush for the newly-mapped device.
> +  * For kdump, at this point, the device is supposed to finish reset at
> +  * the driver probe stage, no in-flight DMA will exist, thus we do not
> +  * need to worry about that anymore hereafter.
>*/
> - if (cap_caching_mode(iommu->cap)) {
> + if (is_kdump_kernel() || cap_caching_mode(iommu->cap)) {
>   iommu->flush.flush_context(iommu, 0,
>  (((u16)bus) << 8) | devfn,
>  DMA_CCMD_MASK_NOBIT,

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

2016-11-16 Thread Xunlei Pang
We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
under kdump, it can be steadily reproduced on several different machines,
the dmesg log is like:
HP HPSA Driver (v 3.4.16-0)
hpsa :02:00.0: using doorbell to reset controller
hpsa :02:00.0: board ready after hard reset.
hpsa :02:00.0: Waiting for controller to respond to no-op
DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 06] 
PTE Read access is not set
hpsa :02:00.0: controller message 03:00 timed out
hpsa :02:00.0: no-op failed; re-trying

After some debugging, we found that the corresponding pte entry value
is correct, and the value of the iommu caching mode is 0, the fault is
probably due to the old iotlb cache of the in-flight DMA.

Thus need to flush the old iotlb after context mapping is setup for the
device, where the device is supposed to finish reset at its driver probe
stage and no in-flight DMA exists hereafter.

With this patch, all our problematic machines can survive the kdump tests.

CC: Myron Stowe <myron.st...@redhat.com>
CC: Don Brace <don.br...@microsemi.com>
CC: Baoquan He <b...@redhat.com>
CC: Dave Young <dyo...@redhat.com>
Tested-by: Joseph Szczypek <jszcz...@redhat.com>
Signed-off-by: Xunlei Pang <xlp...@redhat.com>
---
 drivers/iommu/intel-iommu.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3965e73..eb79288 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2067,9 +2067,16 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
 * It's a non-present to present mapping. If hardware doesn't cache
 * non-present entry we only need to flush the write-buffer. If the
 * _does_ cache non-present entries, then it does so in the special
-* domain #0, which we have to flush:
+* domain #0, which we have to flush.
+*
+* For kdump cases, present entries may be cached due to the in-flight
+* DMA and copied old pgtable, but there is no unmapping behaviour for
+* them, so we need an explicit iotlb flush for the newly-mapped device.
+* For kdump, at this point, the device is supposed to finish reset at
+* the driver probe stage, no in-flight DMA will exist, thus we do not
+* need to worry about that anymore hereafter.
 */
-   if (cap_caching_mode(iommu->cap)) {
+   if (is_kdump_kernel() || cap_caching_mode(iommu->cap)) {
iommu->flush.flush_context(iommu, 0,
   (((u16)bus) << 8) | devfn,
   DMA_CCMD_MASK_NOBIT,
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix the size calculation of pasid table

2016-10-31 Thread Xunlei Pang
On 2016/10/30 at 20:18, David Woodhouse wrote:
> On Wed, 2016-10-12 at 13:17 +0100, David Woodhouse wrote:
>> Yes, that looks correct. I think we may also need to limit it, because
>> full 20-bit PASID support means we'll attempt an order 11 allocation.
>> But that's certainly correct for now
> Actually, not quite correct. You fixed the allocation but not the free.
> And Mika had reported that even the *correct* allocation was failing
> because it was too large. So I've done it differently (untested)...

Yes, your fix looks correct.

Thanks,
Xunlei

> -
> Subject: [PATCH] iommu/vt-d: Fix PASID table allocation
>
> Somehow I ended up with an off-by-three error in calculating the size of
> the PASID and PASID State tables, which triggers allocations failures as
> those tables unfortunately have to be physically contiguous.
>
> In fact, even the *correct* maximum size of 8MiB is problematic and is
> wont to lead to allocation failures. Since I have extracted a promise
> that this *will* be fixed in hardware, I'm happy to limit it on the
> current hardware to a maximum of 0x2 PASIDs, which gives us 1MiB
> tables — still not ideal, but better than before.
>
> Reported by Mika Kuoppala <mika.kuopp...@linux.intel.com> and also by
> Xunlei Pang <xlp...@redhat.com> who submitted a simpler patch to fix
> only the allocation (and not the free) to the "correct" limit... which
> was still problematic.
>
> Signed-off-by: David Woodhouse <dw...@infradead.org>
> ---
>  drivers/iommu/intel-svm.c   | 28 +---
>  include/linux/intel-iommu.h |  1 +
>  2 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 8ebb353..cb72e00 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -39,10 +39,18 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu 
> *iommu)
>   struct page *pages;
>   int order;
>  
> - order = ecap_pss(iommu->ecap) + 7 - PAGE_SHIFT;
> - if (order < 0)
> - order = 0;
> -
> + /* Start at 2 because it's defined as 1^(1+PSS) */
> + iommu->pasid_max = 2 << ecap_pss(iommu->ecap);
> +
> + /* Eventually I'm promised we will get a multi-level PASID table
> +  * and it won't have to be physically contiguous. Until then,
> +  * limit the size because 8MiB contiguous allocations can be hard
> +  * to come by. The limit of 0x2, which is 1MiB for each of
> +  * the PASID and PASID-state tables, is somewhat arbitrary. */
> + if (iommu->pasid_max > 0x2)
> + iommu->pasid_max = 0x2;
> +
> + order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
>   pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
>   if (!pages) {
>   pr_warn("IOMMU: %s: Failed to allocate PASID table\n",
> @@ -53,6 +61,8 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu)
>   pr_info("%s: Allocated order %d PASID table.\n", iommu->name, order);
>  
>   if (ecap_dis(iommu->ecap)) {
> + /* Just making it explicit... */
> + BUILD_BUG_ON(sizeof(struct pasid_entry) != sizeof(struct 
> pasid_state_entry));
>   pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
>   if (pages)
>   iommu->pasid_state_table = page_address(pages);
> @@ -68,11 +78,7 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu)
>  
>  int intel_svm_free_pasid_tables(struct intel_iommu *iommu)
>  {
> - int order;
> -
> - order = ecap_pss(iommu->ecap) + 7 - PAGE_SHIFT;
> - if (order < 0)
> - order = 0;
> + int order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
>  
>   if (iommu->pasid_table) {
>   free_pages((unsigned long)iommu->pasid_table, order);
> @@ -371,8 +377,8 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
> flags, struct svm_dev_
>   }
>   svm->iommu = iommu;
>  
> - if (pasid_max > 2 << ecap_pss(iommu->ecap))
> - pasid_max = 2 << ecap_pss(iommu->ecap);
> + if (pasid_max > iommu->pasid_max)
> + pasid_max = iommu->pasid_max;
>  
>   /* Do not use PASID 0 in caching mode (virtualised IOMMU) */
>   ret = idr_alloc(>pasid_idr, svm,
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 2d9b6500..d49e26c 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -429,6 +4

Re: [PATCH] iommu/vt-d: Fix the size calculation of pasid table

2016-10-10 Thread Xunlei Pang
Ping David for confirmation

On 2016/09/19 at 20:18, Joerg Roedel wrote:
> [Cc'ing David]
>
> On Mon, Sep 12, 2016 at 10:49:11AM +0800, Xunlei Pang wrote:
>> According to the vt-d spec, the size of pasid (state) entry is 8B
>> which equals 3 in power of 2, the number of pasid (state) entries
>> is (ecap_pss + 1) in power of 2.
>>
>> Thus the right size of pasid (state) table in power of 2 should be
>> ecap_pss(iommu->ecap) plus "1+3=4" other than 7.
>>
>> Signed-off-by: Xunlei Pang <xlp...@redhat.com>
>> ---
>>  drivers/iommu/intel-svm.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 8ebb353..cfa75c2 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -39,7 +39,7 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu)
>>  struct page *pages;
>>  int order;
>>  
>> -order = ecap_pss(iommu->ecap) + 7 - PAGE_SHIFT;
>> +order = ecap_pss(iommu->ecap) + 4 - PAGE_SHIFT;
>>  if (order < 0)
>>  order = 0;
> The patch seems to be correct, but I'll let David comment on it first.
>
>
>
>   Joerg
>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: Fix the size calculation of pasid table

2016-09-11 Thread Xunlei Pang
According to the vt-d spec, the size of pasid (state) entry is 8B
which equals 3 in power of 2, the number of pasid (state) entries
is (ecap_pss + 1) in power of 2.

Thus the right size of pasid (state) table in power of 2 should be
ecap_pss(iommu->ecap) plus "1+3=4" other than 7.

Signed-off-by: Xunlei Pang <xlp...@redhat.com>
---
 drivers/iommu/intel-svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 8ebb353..cfa75c2 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -39,7 +39,7 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu)
struct page *pages;
int order;
 
-   order = ecap_pss(iommu->ecap) + 7 - PAGE_SHIFT;
+   order = ecap_pss(iommu->ecap) + 4 - PAGE_SHIFT;
if (order < 0)
order = 0;
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Assign old irt entries a common valid vector in kdump kernel

2016-03-02 Thread Xunlei Pang
On 03/02/2016 at 10:58 PM, Joerg Roedel wrote:
> On Wed, Mar 02, 2016 at 06:02:28PM +0800, Xunlei Pang wrote:
>> Currently, the kernel copies the old irt entries during iommu
>> initialization for kdump, so old vectors in the first kernel are
>> used but having no related kernel irq handlers set explicitly,
>> this can lead to some problems after lapics are enabled:
>>  - When some in-flight dma finished and triggered an interrupt,
>>the kernel will throw a warning message in do_IRQ() like "No
>>irq handler", because handle_irq() will return false with no
>>irq_desc handlers. This may confuse users.
>>  - When the in-flight dma interrupt arrives, and if there happens
>>to be an irq with the same vector allocated in kdump kernel,
>>it will invoke the existing ISR registered in kdump kernel as
>>if one valid interrupt in the kdump kernel happens. This might
>>cause some wrong software logic, for example if the ISR always
>>wakes up a process.
> Hmm, the current situation with misdirected irq messages in the kdump
> kernel is not different from a situation without any iommu at all,
> right?

Right, non-iommu in-flight DMA after crash also suffers from this.
I think both of them should be fixed if possible.

> And the goal of preserving the old mappings is to get as close as
> possible to the situation without iommu. This seems to carry the VT-d
> driver away from that.

Without iommu, it's not so easy to fix due to the MSI registers
located in different pci devices. But vt-d introduces a mechanism
to redirect both MSI/MSI-X and I/O APIC to a common IR table,
so we can handle that much easily with the help of the IR table.

On kdump side, present-day servers with vt-d enabled are
becoming increasingly common-place, if this does happen in
real world(usually it will), that would be hard to dig it out, so I
think it would be better if we can fix it.

Also CC kexec list

Regards,
Xunlei

>
>
>   Joerg
>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: Assign old irt entries a common valid vector in kdump kernel

2016-03-02 Thread Xunlei Pang
Currently, the kernel copies the old irt entries during iommu
initialization for kdump, so old vectors in the first kernel are
used but having no related kernel irq handlers set explicitly,
this can lead to some problems after lapics are enabled:
 - When some in-flight dma finished and triggered an interrupt,
   the kernel will throw a warning message in do_IRQ() like "No
   irq handler", because handle_irq() will return false with no
   irq_desc handlers. This may confuse users.
 - When the in-flight dma interrupt arrives, and if there happens
   to be an irq with the same vector allocated in kdump kernel,
   it will invoke the existing ISR registered in kdump kernel as
   if one valid interrupt in the kdump kernel happens. This might
   cause some wrong software logic, for example if the ISR always
   wakes up a process.

This patch addresses the issue by modifying the old irt entries
to use a special allocated vector and related irq handler.

Signed-off-by: Xunlei Pang <xlp...@redhat.com>
---
 drivers/iommu/intel_irq_remapping.c | 78 -
 1 file changed, 68 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index ac59692..e044e0f 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -400,6 +400,68 @@ static int set_msi_sid(struct irte *irte, struct pci_dev 
*dev)
return 0;
 }
 
+/*
+ * Old entries may contain vector data with no related irq handlers
+ * in the new kernel, replace them with this common vector assigned
+ * with related irq handler.
+ */
+static u8 redirected_vector;
+
+static void lapic_mask_ack(struct irq_data *dummy)
+{
+   /* We don't know how to mask it, only do lapic-level ack */
+   ack_APIC_irq();
+}
+
+static struct irq_chip fake_chip = {
+   .irq_mask_ack = lapic_mask_ack,
+};
+
+/* Allocate the common vector for all iommus' old irte */
+static void iommu_alloc_redirected_vector(struct intel_iommu *iommu)
+{
+   struct irq_alloc_info info;
+   int irq;
+
+   if (redirected_vector)
+   return;
+
+   init_irq_alloc_info(, NULL);
+   irq = irq_domain_alloc_irqs(x86_vector_domain, 1, -1, );
+   if (irq < 0) {
+   pr_err("Failed to alloc redirected vector for %s's old IRTEs\n",
+   iommu->name);
+   return;
+   }
+
+   /*
+* NOTE: we can assign the edge handler here to be shared
+* by all irt entries with the same redirected_vector but
+* different trigger mode, because edge and level handlers
+* behave similarly with disabled irq or no actions.
+*/
+   irq_set_chip_and_handler(irq, _chip, handle_edge_irq);
+   redirected_vector = irqd_cfg(irq_get_irq_data(irq))->vector;
+
+   pr_info("Redirect %s's old IRTEs to use Vector%u and IRQ%d\n",
+   iommu->name, redirected_vector, irq);
+}
+
+/* Make sure we have a valid vector and irq handler after copy */
+static inline void iommu_assign_old_irte(struct ir_table *new_table,
+   struct irte *old_entry, unsigned int i)
+{
+   struct irte *new_entry = _table->base[i];
+
+   if (!old_entry->present)
+   return;
+
+   memcpy(new_entry, old_entry, sizeof(struct irte));
+   if (redirected_vector)
+   new_entry->vector = redirected_vector;
+   bitmap_set(new_table->bitmap, i, 1);
+}
+
 static int iommu_load_old_irte(struct intel_iommu *iommu)
 {
struct irte *old_ir_table;
@@ -430,21 +492,17 @@ static int iommu_load_old_irte(struct intel_iommu *iommu)
if (!old_ir_table)
return -ENOMEM;
 
-   /* Copy data over */
-   memcpy(iommu->ir_table->base, old_ir_table, size);
-
-   __iommu_flush_cache(iommu, iommu->ir_table->base, size);
+   iommu_alloc_redirected_vector(iommu);
 
/*
-* Now check the table for used entries and mark those as
-* allocated in the bitmap
+* Copy and adjust old data, and check the table for used entries,
+* and mark those as allocated in the bitmap
 */
-   for (i = 0; i < INTR_REMAP_TABLE_ENTRIES; i++) {
-   if (iommu->ir_table->base[i].present)
-   bitmap_set(iommu->ir_table->bitmap, i, 1);
-   }
+   for (i = 0; i < INTR_REMAP_TABLE_ENTRIES; i++)
+   iommu_assign_old_irte(iommu->ir_table, _ir_table[i], i);
 
memunmap(old_ir_table);
+   __iommu_flush_cache(iommu, iommu->ir_table->base, size);
 
return 0;
 }
-- 
2.5.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu