Re: [PATCH] x86/efi: Do not release sub-1MB memory regions when the crashkernel option is specified

2021-04-12 Thread lijiang
Thank you for the comment, H. Peter, Andy and Baoquan.

在 2021年04月12日 17:52, Baoquan He 写道:
> On 04/11/21 at 06:49pm, Andy Lutomirski wrote:
>>
>>
>>> On Apr 11, 2021, at 6:14 PM, Baoquan He  wrote:
>>>
>>> On 04/09/21 at 07:59pm, H. Peter Anvin wrote:
 Why don't we do this unconditionally? At the very best we gain half a 
 megabyte of memory (except the trampoline, which has to live there, but it 
 is only a few kilobytes.)
>>>
>>> This is a great suggestion, thanks. I think we can fix it in this way to
>>> make code simpler. Then the specific caring of real mode in
>>> efi_free_boot_services() can be removed too.
>>>
>>
>> This whole situation makes me think that the code is buggy before and buggy 
>> after.
>>
>> The issue here (I think) is that various pieces of code want to reserve 
>> specific pieces of otherwise-available low memory for their own nefarious 
>> uses. I don’t know *why* crash kernel needs this, but that doesn’t matter 
>> too much.
> 
> Kdump kernel also need go through real mode code path during bootup. It
> is not different than normal kernel except that it skips the firmware
> resetting. So kdump kernel needs low 1M as system RAM just as normal
> kernel does. Here we reserve the whole low 1M with memblock_reserve()
> to avoid any later kernel or driver data reside in this area. Otherwise,
> we need dump the content of this area to vmcore. As we know, when crash
> happened, the old memory of 1st kernel should be untouched until vmcore
> dumping read out its content. Meanwhile, kdump kernel need reuse low 1M.
> In the past, we used a back up region to copy out the low 1M area, and
> map the back up region into the low 1M area in vmcore elf file. In
> 6f599d84231fd27 ("x86/kdump: Always reserve the low 1M when the crashkernel
> option is specified"), we changed to lock the whole low 1M to avoid
> writting any kernel data into, like this we can skip this area when
> dumping vmcore.
> 
> Above is why we try to memblock reserve the whole low 1M. We don't want
> to use it, just don't want anyone to use it in 1st kernel.
> 
>>
>> I propose that the right solution is to give low-memory-reserving code paths 
>> two chances to do what they need: once at the very beginning and once after 
>> EFI boot services are freed.
>>
>> Alternatively, just reserve *all* otherwise unused sub 1M memory up front, 
>> then release it right after releasing boot services, and then invoke the 
>> special cases exactly once.
> 

After EFI boot services are freed, I'm worried that it's a bit late. All sub-1M 
memory regions need to be reserved early as soon as possible.

> I am not sure if I got both suggested ways clearly. They look a little
> complicated in our case. As I explained at above, we want the whole low
> 1M locked up, not one piece or some pieces of it.
> 
>>
>> In either case, the result is that the crashkernel mess gets unified with 
>> the trampoline mess.  One way the result is called twice and needs to be 
>> more careful, and the other way it’s called only once.
>>

That may still have a chance to allocate memory from sub-1M regions at some 
point, because EFI boot services will be freed after EFI enters virtual mode, 
it looks late.

>> Just skipping freeing boot services seems wrong.  It doesn’t unmap boot 
>> services, and skipping that is incorrect, I think. And it seems to result in 
>> a bogus memory map in which the system thinks that some crashkernel memory 
>> is EFI memory instead.
> 
> I like hpa's thought to lock the whole low 1M unconditionally since only
> a few KB except of trampoline area is there. Rethinking about it, doing
> it in can_free_region() may be risky because efi memory region could
> cross the 1M boundary, e.g [640K, 100M] with type of
> EFI_BOOT_SERVICES_CODE|EFI_BOOT_SERVICES_DATA, it could cause loss of memory.

Theoretically, yes. But so far I haven't seen the situation of crossing the 1M 
boundary.

Thanks.
Lianbo

> Just a wild guess, not very sure if the 1M boundary corssing can really
> happen. efi_reserve_boot_services() won't split regions.
> 
> If moving efi_reserve_boot_services() after reserve_real_mode() is not
> accepted, maybe we can call efi_mem_reserve(0, 1M) just as
> efi_esrt_init() has done.
> 



Re: [PATCH] x86/efi: Do not release sub-1MB memory regions when the crashkernel option is specified

2021-04-09 Thread lijiang
Hi, Baoquan
Thank you for the comment.
在 2021年04月09日 20:44, Baoquan He 写道:
> On 04/07/21 at 10:03pm, Lianbo Jiang wrote:
>> Some sub-1MB memory regions may be reserved by EFI boot services, and the
>> memory regions will be released later in the efi_free_boot_services().
>>
>> Currently, always reserve all sub-1MB memory regions when the crashkernel
>> option is specified, but unfortunately EFI boot services may have already
>> reserved some sub-1MB memory regions before the crash_reserve_low_1M() is
>> called, which makes that the crash_reserve_low_1M() only own the
>> remaining sub-1MB memory regions, not all sub-1MB memory regions, because,
>> subsequently EFI boot services will free its own sub-1MB memory regions.
>> Eventually, DMA will be able to allocate memory from the sub-1MB area and
>> cause the following error:
>>
> 
> So this patch is fixing a problem found in crash utility. We ever met
> the similar issue, later fixed by always reserving low 1M in commit
> 6f599d84231fd27 ("x86/kdump: Always reserve the low 1M when the crashkernel
> option is specified"). Seems the commit is not fixing it completely.
> 
Maybe I should add the "Fixes: 6f599d84231f" in front of 'Signed-off-by' as 
below:

Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel 
option is specified")

>> crash> kmem -s |grep invalid
>> kmem: dma-kmalloc-512: slab: d52c40001900 invalid freepointer: 
>> 9403c0067300
>> kmem: dma-kmalloc-512: slab: d52c40001900 invalid freepointer: 
>> 9403c0067300
>> crash> vtop 9403c0067300
>> VIRTUAL   PHYSICAL
>> 9403c0067300  67300   --->The physical address falls into this range 
>> [0x00063000-0x0008efff]
>>
>> kernel debugging log:
>> ...
>> [0.008927] memblock_reserve: [0x0001-0x00013fff] 
>> efi_reserve_boot_services+0x85/0xd0
>> [0.008930] memblock_reserve: [0x00063000-0x0008efff] 
>> efi_reserve_boot_services+0x85/0xd0
>> ...
>> [0.009425] memblock_reserve: [0x-0x000f] 
>> crash_reserve_low_1M+0x2c/0x49
>> ...
>> [0.010586] Zone ranges:
>> [0.010587]   DMA  [mem 0x1000-0x00ff]
>> [0.010589]   DMA32[mem 0x0100-0x]
>> [0.010591]   Normal   [mem 0x0001-0x000c7fff]
>> [0.010593]   Device   empty
>> ...
>> [8.814894] __memblock_free_late: [0x00063000-0x0008efff] 
>> efi_free_boot_services+0x14b/0x23b
>> [8.815793] __memblock_free_late: [0x0001-0x00013fff] 
>> efi_free_boot_services+0x14b/0x23b
> 
> 
> In commit 6f599d84231fd27, we call crash_reserve_low_1M() to lock the
> whole low 1M area if crashkernel is specified in kernel cmdline.
> But earlier efi_reserve_boot_services() invokation will break the
> intention of the whole low 1M reserving. In efi_reserve_boot_services(),
> if any memory under low 1M hasn't been reserved, it will call
> memblock_reserve() to reserve it and leave it to
> efi_free_boot_services() to free.
> 

Good understanding.

> Hi Lianbo,
> 
> Please correct me if I am wrong or anything is missed. IIUC, can we move
> efi_reserve_boot_services() after reserve_real_mode() to fix this bug?

What do you think about the following changes?

patch [1]:

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5ecd69a48393..c343de3178ec 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1064,12 +1064,6 @@ void __init setup_arch(char **cmdline_p)
efi_esrt_init();
efi_mokvar_table_init();
 
-   /*
-* The EFI specification says that boot service code won't be
-* called after ExitBootServices(). This is, in fact, a lie.
-*/
-   efi_reserve_boot_services();
-
/* preallocate 4k for mptable mpc */
e820__memblock_alloc_reserved_mpc_new();
 
@@ -1087,6 +1081,12 @@ void __init setup_arch(char **cmdline_p)
trim_platform_memory_ranges();
trim_low_memory_range();
 
+   /*
+* The EFI specification says that boot service code won't be
+* called after ExitBootServices(). This is, in fact, a lie.
+*/
+   efi_reserve_boot_services();
+
init_mem_mapping();
 
idt_setup_early_pf();

> Or move reserve_real_mode() before efi_reserve_boot_services() since
> those real mode regions are all under 1M? Assume efi boot code/data

Or patch [2]

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5ecd69a48393..ceec5af0dfab 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1058,6 +1058,7 @@ void __init setup_arch(char **cmdline_p)
sev_setup_arch();
 
reserve_bios_regions();
+   reserve_real_mode();
 
efi_fake_memmap();
efi_find_mirror();
@@ -1082,8 +1083,6 @@ void __init setup_arch(char **cmdline_p)
(max_pfn_mapped< won't rely on low 1M area any more at this moment.
> 

Re: [PATCH 2/2 v2] iommu: use the __iommu_attach_device() directly for deferred attach

2021-01-21 Thread lijiang
Hi, Christoph

在 2021年01月19日 23:29, Christoph Hellwig 写道:
>> +int iommu_do_deferred_attach(struct device *dev,
>> + struct iommu_domain *domain)
> 
> I'd remove the "do_" from the name, it doesn't really add any value.
> 
OK.

>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +
>> +if (unlikely(ops->is_attach_deferred &&
>> + ops->is_attach_deferred(domain, dev)))
>> +return __iommu_attach_device(domain, dev);
>> +
>> +return 0;
> 
> Now that everyting is under the static key we don't really need to
> bother with the unlikely micro optimization, do we?
> 
Good understanding. I can remove it, otherwise it should use the likely().

>> +extern int iommu_do_deferred_attach(struct device *dev,
>> +struct iommu_domain *domain);
> 
> No need for the extern.
> 
Sounds good. I will remove the 'extern' when I post again.

Thanks.
Lianbo



Re: [PATCH 1/2 v2] dma-iommu: use static-key to minimize the impact in the fast-path

2021-01-21 Thread lijiang
Hi, Christoph

Thanks for the comment.
在 2021年01月19日 23:26, Christoph Hellwig 写道:
> On Tue, Jan 19, 2021 at 07:16:15PM +0800, Lianbo Jiang wrote:
>> +static DEFINE_STATIC_KEY_FALSE(__deferred_attach);
> Why the strange underscores?  Wouldn't iommu_deferred_attach_enabled

The variable is defined with the static keyword, which indicates that the
variable is only used in the local module(file), and gives a hint explicitly
with the underscore prefix. Anyway, this is my personal opinion.

> be a better name?
> 
It could be a long name?

Thanks.
Lianbo



Re: [PATCH] iommu: check for the deferred attach when attaching a device

2021-01-18 Thread lijiang
在 2021年01月15日 23:15, Robin Murphy 写道:
> On 2021-01-15 14:26, lijiang wrote:
>> Hi, Robin
>>
>> Thank you for the comment.
>>
>> 在 2021年01月13日 01:29, Robin Murphy 写道:
>>> On 2021-01-05 07:52, lijiang wrote:
>>>> 在 2021年01月05日 11:55, lijiang 写道:
>>>>> Hi,
>>>>>
>>>>> Also add Joerg to cc list.
>>>>>
>>>>
>>>> Also add more people to cc list, Jerry Snitselaar and Tom Lendacky.
>>>>
>>>> Thanks.
>>>>
>>>>> Thanks.
>>>>> Lianbo
>>>>> 在 2020年12月26日 13:39, Lianbo Jiang 写道:
>>>>>> Currently, because domain attach allows to be deferred from iommu
>>>>>> driver to device driver, and when iommu initializes, the devices
>>>>>> on the bus will be scanned and the default groups will be allocated.
>>>>>>
>>>>>> Due to the above changes, some devices could be added to the same
>>>>>> group as below:
>>>>>>
>>>>>> [    3.859417] pci :01:00.0: Adding to iommu group 16
>>>>>> [    3.864572] pci :01:00.1: Adding to iommu group 16
>>>>>> [    3.869738] pci :02:00.0: Adding to iommu group 17
>>>>>> [    3.874892] pci :02:00.1: Adding to iommu group 17
>>>>>>
>>>>>> But when attaching these devices, it doesn't allow that a group has
>>>>>> more than one device, otherwise it will return an error. This conflicts
>>>>>> with the deferred attaching. Unfortunately, it has two devices in the
>>>>>> same group for my side, for example:
>>>>>>
>>>>>> [    9.627014] iommu_group_device_count(): device name[0]::01:00.0
>>>>>> [    9.633545] iommu_group_device_count(): device name[1]::01:00.1
>>>>>> ...
>>>>>> [   10.255609] iommu_group_device_count(): device name[0]::02:00.0
>>>>>> [   10.262144] iommu_group_device_count(): device name[1]::02:00.1
>>>>>>
>>>>>> Finally, which caused the failure of tg3 driver when tg3 driver calls
>>>>>> the dma_alloc_coherent() to allocate coherent memory in the 
>>>>>> tg3_test_dma().
>>>>>>
>>>>>> [    9.660310] tg3 :01:00.0: DMA engine test failed, aborting
>>>>>> [    9.754085] tg3: probe of :01:00.0 failed with error -12
>>>>>> [    9.997512] tg3 :01:00.1: DMA engine test failed, aborting
>>>>>> [   10.043053] tg3: probe of :01:00.1 failed with error -12
>>>>>> [   10.288905] tg3 :02:00.0: DMA engine test failed, aborting
>>>>>> [   10.334070] tg3: probe of :02:00.0 failed with error -12
>>>>>> [   10.578303] tg3 :02:00.1: DMA engine test failed, aborting
>>>>>> [   10.622629] tg3: probe of :02:00.1 failed with error -12
>>>>>>
>>>>>> In addition, the similar situations also occur in other drivers such
>>>>>> as the bnxt_en driver. That can be reproduced easily in kdump kernel
>>>>>> when SME is active.
>>>>>>
>>>>>> Add a check for the deferred attach in the iommu_attach_device() and
>>>>>> allow to attach the deferred device regardless of how many devices
>>>>>> are in a group.
>>>
>>> Is this iommu_attach_device() call is coming from iommu-dma? (if not, then 
>>> whoever's calling it probably shouldn't be)
>>>
>>
>> Yes, you are right, the iommu_attach_device call is coming from iommu-dma.
>>  
>>> Assuming so, then probably what should happen is to move the handling 
>>> currently in iommu_dma_deferred_attach() into the core so that it can call 
>>> __iommu_attach_device() directly - the intent is just to replay that exact 
>>> call skipped in iommu_group_add_device(), so the legacy external 
>>> iommu_attach_device() interface isn't really the right tool for the job
>>
>> Sounds good. I will check if this can work in various cases. If it's OK, I 
>> will post again.
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index f0305e6aac1b..5e7da902ac36 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -23,7 +23,6 @@
>>   #include 
>>   #include 
>>   #include 
>> -#include 
>>   #include 
>>     stru

Re: [PATCH] iommu: check for the deferred attach when attaching a device

2021-01-15 Thread lijiang
Hi, Robin

Thank you for the comment.

在 2021年01月13日 01:29, Robin Murphy 写道:
> On 2021-01-05 07:52, lijiang wrote:
>> 在 2021年01月05日 11:55, lijiang 写道:
>>> Hi,
>>>
>>> Also add Joerg to cc list.
>>>
>>
>> Also add more people to cc list, Jerry Snitselaar and Tom Lendacky.
>>
>> Thanks.
>>
>>> Thanks.
>>> Lianbo
>>> 在 2020年12月26日 13:39, Lianbo Jiang 写道:
>>>> Currently, because domain attach allows to be deferred from iommu
>>>> driver to device driver, and when iommu initializes, the devices
>>>> on the bus will be scanned and the default groups will be allocated.
>>>>
>>>> Due to the above changes, some devices could be added to the same
>>>> group as below:
>>>>
>>>> [    3.859417] pci :01:00.0: Adding to iommu group 16
>>>> [    3.864572] pci :01:00.1: Adding to iommu group 16
>>>> [    3.869738] pci :02:00.0: Adding to iommu group 17
>>>> [    3.874892] pci :02:00.1: Adding to iommu group 17
>>>>
>>>> But when attaching these devices, it doesn't allow that a group has
>>>> more than one device, otherwise it will return an error. This conflicts
>>>> with the deferred attaching. Unfortunately, it has two devices in the
>>>> same group for my side, for example:
>>>>
>>>> [    9.627014] iommu_group_device_count(): device name[0]::01:00.0
>>>> [    9.633545] iommu_group_device_count(): device name[1]::01:00.1
>>>> ...
>>>> [   10.255609] iommu_group_device_count(): device name[0]::02:00.0
>>>> [   10.262144] iommu_group_device_count(): device name[1]::02:00.1
>>>>
>>>> Finally, which caused the failure of tg3 driver when tg3 driver calls
>>>> the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma().
>>>>
>>>> [    9.660310] tg3 :01:00.0: DMA engine test failed, aborting
>>>> [    9.754085] tg3: probe of :01:00.0 failed with error -12
>>>> [    9.997512] tg3 :01:00.1: DMA engine test failed, aborting
>>>> [   10.043053] tg3: probe of :01:00.1 failed with error -12
>>>> [   10.288905] tg3 :02:00.0: DMA engine test failed, aborting
>>>> [   10.334070] tg3: probe of :02:00.0 failed with error -12
>>>> [   10.578303] tg3 :02:00.1: DMA engine test failed, aborting
>>>> [   10.622629] tg3: probe of :02:00.1 failed with error -12
>>>>
>>>> In addition, the similar situations also occur in other drivers such
>>>> as the bnxt_en driver. That can be reproduced easily in kdump kernel
>>>> when SME is active.
>>>>
>>>> Add a check for the deferred attach in the iommu_attach_device() and
>>>> allow to attach the deferred device regardless of how many devices
>>>> are in a group.
> 
> Is this iommu_attach_device() call is coming from iommu-dma? (if not, then 
> whoever's calling it probably shouldn't be)
> 

Yes, you are right, the iommu_attach_device call is coming from iommu-dma.
 
> Assuming so, then probably what should happen is to move the handling 
> currently in iommu_dma_deferred_attach() into the core so that it can call 
> __iommu_attach_device() directly - the intent is just to replay that exact 
> call skipped in iommu_group_add_device(), so the legacy external 
> iommu_attach_device() interface isn't really the right tool for the job 

Sounds good. I will check if this can work in various cases. If it's OK, I will 
post again.

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f0305e6aac1b..5e7da902ac36 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 struct iommu_dma_msi_page {
@@ -378,21 +377,6 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
return iova_reserve_iommu_regions(dev, domain);
 }
 
-static int iommu_dma_deferred_attach(struct device *dev,
-   struct iommu_domain *domain)
-{
-   const struct iommu_ops *ops = domain->ops;
-
-   if (!is_kdump_kernel())
-   return 0;
-
-   if (unlikely(ops->is_attach_deferred &&
-   ops->is_attach_deferred(domain, dev)))
-   return iommu_attach_device(domain, dev);
-
-   return 0;
-}
-
 /**
  * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
  *page flags.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ffeebda8d6de..4fed1567b498 100644
--- a/driver

Re: [PATCH] iommu: check for the deferred attach when attaching a device

2021-01-04 Thread lijiang
在 2021年01月05日 11:55, lijiang 写道:
> Hi,
> 
> Also add Joerg to cc list.
> 

Also add more people to cc list, Jerry Snitselaar and Tom Lendacky.

Thanks.

> Thanks.
> Lianbo
> 在 2020年12月26日 13:39, Lianbo Jiang 写道:
>> Currently, because domain attach allows to be deferred from iommu
>> driver to device driver, and when iommu initializes, the devices
>> on the bus will be scanned and the default groups will be allocated.
>>
>> Due to the above changes, some devices could be added to the same
>> group as below:
>>
>> [3.859417] pci :01:00.0: Adding to iommu group 16
>> [3.864572] pci :01:00.1: Adding to iommu group 16
>> [3.869738] pci :02:00.0: Adding to iommu group 17
>> [3.874892] pci :02:00.1: Adding to iommu group 17
>>
>> But when attaching these devices, it doesn't allow that a group has
>> more than one device, otherwise it will return an error. This conflicts
>> with the deferred attaching. Unfortunately, it has two devices in the
>> same group for my side, for example:
>>
>> [9.627014] iommu_group_device_count(): device name[0]::01:00.0
>> [9.633545] iommu_group_device_count(): device name[1]::01:00.1
>> ...
>> [   10.255609] iommu_group_device_count(): device name[0]::02:00.0
>> [   10.262144] iommu_group_device_count(): device name[1]::02:00.1
>>
>> Finally, which caused the failure of tg3 driver when tg3 driver calls
>> the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma().
>>
>> [9.660310] tg3 :01:00.0: DMA engine test failed, aborting
>> [9.754085] tg3: probe of :01:00.0 failed with error -12
>> [9.997512] tg3 :01:00.1: DMA engine test failed, aborting
>> [   10.043053] tg3: probe of :01:00.1 failed with error -12
>> [   10.288905] tg3 :02:00.0: DMA engine test failed, aborting
>> [   10.334070] tg3: probe of :02:00.0 failed with error -12
>> [   10.578303] tg3 :02:00.1: DMA engine test failed, aborting
>> [   10.622629] tg3: probe of :02:00.1 failed with error -12
>>
>> In addition, the similar situations also occur in other drivers such
>> as the bnxt_en driver. That can be reproduced easily in kdump kernel
>> when SME is active.
>>
>> Add a check for the deferred attach in the iommu_attach_device() and
>> allow to attach the deferred device regardless of how many devices
>> are in a group.
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  drivers/iommu/iommu.c | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index ffeebda8d6de..dccab7b133fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1967,8 +1967,11 @@ int iommu_attach_device(struct iommu_domain *domain, 
>> struct device *dev)
>>   */
>>  mutex_lock(>mutex);
>>  ret = -EINVAL;
>> -if (iommu_group_device_count(group) != 1)
>> +if (!iommu_is_attach_deferred(domain, dev) &&
>> +iommu_group_device_count(group) != 1) {
>> +dev_err_ratelimited(dev, "Group has more than one device\n");
>>  goto out_unlock;
>> +}
>>  
>>  ret = __iommu_attach_group(domain, group);
>>  
>>



Re: [PATCH] iommu: check for the deferred attach when attaching a device

2021-01-04 Thread lijiang
Hi,

Also add Joerg to cc list.

Thanks.
Lianbo
在 2020年12月26日 13:39, Lianbo Jiang 写道:
> Currently, because domain attach allows to be deferred from iommu
> driver to device driver, and when iommu initializes, the devices
> on the bus will be scanned and the default groups will be allocated.
> 
> Due to the above changes, some devices could be added to the same
> group as below:
> 
> [3.859417] pci :01:00.0: Adding to iommu group 16
> [3.864572] pci :01:00.1: Adding to iommu group 16
> [3.869738] pci :02:00.0: Adding to iommu group 17
> [3.874892] pci :02:00.1: Adding to iommu group 17
> 
> But when attaching these devices, it doesn't allow that a group has
> more than one device, otherwise it will return an error. This conflicts
> with the deferred attaching. Unfortunately, it has two devices in the
> same group for my side, for example:
> 
> [9.627014] iommu_group_device_count(): device name[0]::01:00.0
> [9.633545] iommu_group_device_count(): device name[1]::01:00.1
> ...
> [   10.255609] iommu_group_device_count(): device name[0]::02:00.0
> [   10.262144] iommu_group_device_count(): device name[1]::02:00.1
> 
> Finally, which caused the failure of tg3 driver when tg3 driver calls
> the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma().
> 
> [9.660310] tg3 :01:00.0: DMA engine test failed, aborting
> [9.754085] tg3: probe of :01:00.0 failed with error -12
> [9.997512] tg3 :01:00.1: DMA engine test failed, aborting
> [   10.043053] tg3: probe of :01:00.1 failed with error -12
> [   10.288905] tg3 :02:00.0: DMA engine test failed, aborting
> [   10.334070] tg3: probe of :02:00.0 failed with error -12
> [   10.578303] tg3 :02:00.1: DMA engine test failed, aborting
> [   10.622629] tg3: probe of :02:00.1 failed with error -12
> 
> In addition, the similar situations also occur in other drivers such
> as the bnxt_en driver. That can be reproduced easily in kdump kernel
> when SME is active.
> 
> Add a check for the deferred attach in the iommu_attach_device() and
> allow to attach the deferred device regardless of how many devices
> are in a group.
> 
> Signed-off-by: Lianbo Jiang 
> ---
>  drivers/iommu/iommu.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index ffeebda8d6de..dccab7b133fb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1967,8 +1967,11 @@ int iommu_attach_device(struct iommu_domain *domain, 
> struct device *dev)
>*/
>   mutex_lock(>mutex);
>   ret = -EINVAL;
> - if (iommu_group_device_count(group) != 1)
> + if (!iommu_is_attach_deferred(domain, dev) &&
> + iommu_group_device_count(group) != 1) {
> + dev_err_ratelimited(dev, "Group has more than one device\n");
>   goto out_unlock;
> + }
>  
>   ret = __iommu_attach_group(domain, group);
>  
> 



Re: [PATCH v3 1/1] kdump: append uts_namespace.name offset to VMCOREINFO

2020-10-01 Thread lijiang
Hi, Alexander

在 2020年09月30日 18:23, Alexander Egorenkov 写道:
> The offset of the field 'init_uts_ns.name' has changed
> since commit 9a56493f6942 ("uts: Use generic ns_common::count").
> 
> Link: 
> https://lore.kernel.org/r/159644978167.604812.1773586504374412107.stgit@localhost.localdomain
> 
> Make the offset of the field 'uts_namespace.name' available
> in VMCOREINFO because tools like 'crash-utility' and
> 'makedumpfile' must be able to read it from crash dumps.
> 
> Signed-off-by: Alexander Egorenkov 
> ---
> 
> v2 -> v3:
>  * Added documentation to vmcoreinfo.rst
>  * Use the short form of the commit reference
> 
> v1 -> v2:
>  * Improved commit message
>  * Added link to the discussion of the uts namespace changes
> 
>  Documentation/admin-guide/kdump/vmcoreinfo.rst | 6 ++
>  kernel/crash_core.c| 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst 
> b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> index e44a6c01f336..3861a25faae1 100644
> --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
> +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> @@ -39,6 +39,12 @@ call.
>  User-space tools can get the kernel name, host name, kernel release
>  number, kernel version, architecture name and OS type from it.
>  
> +(uts_namespace, name)
> +-
> +
> +Offset of the name's member. Crash Utility and Makedumpfile get
> +the start address of the init_uts_ns.name from this.
> +

Thank you for the update. The v3 looks good to me.

>  node_online_map
>  ---
>  
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 106e4500fd53..173fdc261882 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -447,6 +447,7 @@ static int __init crash_save_vmcoreinfo_init(void)
>   VMCOREINFO_PAGESIZE(PAGE_SIZE);
>  
>   VMCOREINFO_SYMBOL(init_uts_ns);
> + VMCOREINFO_OFFSET(uts_namespace, name);
>   VMCOREINFO_SYMBOL(node_online_map);
>  #ifdef CONFIG_MMU
>   VMCOREINFO_SYMBOL_ARRAY(swapper_pg_dir);
> 



Re: [PATCH v2 1/1] kdump: append uts_namespace.name offset to VMCOREINFO

2020-09-30 Thread lijiang
Hi, Alexander

在 2020年09月24日 20:46, Alexander Egorenkov 写道:
> The offset of the field 'init_uts_ns.name' has changed
> since
> 
> commit 9a56493f6942c0e2df1579986128721da96e00d8
> Author: Kirill Tkhai 
> Date:   Mon Aug 3 13:16:21 2020 +0300
> 
> uts: Use generic ns_common::count
> 
> Link: 
> https://lore.kernel.org/r/159644978167.604812.1773586504374412107.stgit@localhost.localdomain
> 
> Make the offset of the field 'uts_namespace.name' available
> in VMCOREINFO because tools like 'crash-utility' and
> 'makedumpfile' must be able to read it from crash dumps.
> 
> Signed-off-by: Alexander Egorenkov 
> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 106e4500fd53..173fdc261882 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -447,6 +447,7 @@ static int __init crash_save_vmcoreinfo_init(void)
>   VMCOREINFO_PAGESIZE(PAGE_SIZE);
>  
>   VMCOREINFO_SYMBOL(init_uts_ns);
> + VMCOREINFO_OFFSET(uts_namespace, name);

Since the new symbol is exported, would you mind adding it to 
Documentation/kdump/vmcoreinfo.txt ?

Thanks.
Lianbo
>   VMCOREINFO_SYMBOL(node_online_map);
>  #ifdef CONFIG_MMU
>   VMCOREINFO_SYMBOL_ARRAY(swapper_pg_dir);
> 



Re: [PATCH v2] docs: admin-guide: update kdump documentation due to change of crash URL

2020-09-23 Thread lijiang
Since crash utility has been moved to github, the original URL is no
longer available. Let's update it accordingly.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 Documentation/admin-guide/kdump/kdump.rst | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index 2da65fef2a1c..75a9dd98e76e 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -509,9 +509,12 @@ ELF32-format headers using the --elf32-core-headers kernel 
option on the
 dump kernel.
 
 You can also use the Crash utility to analyze dump files in Kdump
-format. Crash is available on Dave Anderson's site at the following URL:
+format. Crash is available at the following URL:
 
-   http://people.redhat.com/~anderson/
+   https://github.com/crash-utility/crash
+
+Crash document can be found at:
+   https://crash-utility.github.io/
 
 Trigger Kdump on WARN()
 ===
-- 
2.17.1



Re: [PATCH] docs: admin-guide: update kdump documentation due to change of crash URL

2020-09-22 Thread lijiang
在 2020年09月18日 16:09, Lianbo Jiang 写道:
> Since crash utility has moved to github, the original URL is no longer
   ^
  has been moved to github

Because of the above mistake, I'd like to correct it and reply it with the v2.

Thanks.

> available. Let's update it accordingly.
> 
> Suggested-by: Dave Young 
> Signed-off-by: Lianbo Jiang 
> ---
>  Documentation/admin-guide/kdump/kdump.rst | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..75a9dd98e76e 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -509,9 +509,12 @@ ELF32-format headers using the --elf32-core-headers 
> kernel option on the
>  dump kernel.
>  
>  You can also use the Crash utility to analyze dump files in Kdump
> -format. Crash is available on Dave Anderson's site at the following URL:
> +format. Crash is available at the following URL:
>  
> -   http://people.redhat.com/~anderson/
> +   https://github.com/crash-utility/crash
> +
> +Crash document can be found at:
> +   https://crash-utility.github.io/
>  
>  Trigger Kdump on WARN()
>  ===
> 



Re: [PATCH v3 3/3] printk: use the lockless ringbuffer

2020-07-07 Thread lijiang
在 2020年07月03日 19:54, John Ogness 写道:
> On 2020-07-02, lijiang  wrote:
>> About the VMCOREINFO part, I made some tests based on the kernel patch
>> v3, the makedumpfile and crash-utility can work as expected with your
>> patch(userspace patch), but, unfortunately, the
>> vmcore-dmesg(kexec-tools) can't correctly read the printk ring buffer
>> information, and get the following error:
>>
>> "Missing the log_buf symbol"
>>
>> The kexec-tools(vmcore-dmesg) should also have a similar patch, just like
>> in the makedumpfile and crash-utility.
> 
> A patched kexec-tools is available here [0].
> 
> I did not test using 32-bit dumps on 64-bit machines and vice versa. But
> it should work.
> 
> John Ogness
> 
> [0] https://github.com/Linutronix/kexec-tools.git (printk branch)
> 

After applying this patch, the vmcore-dmesg can work.

Thank you, John Ogness.



Re: [PATCH v3 3/3] printk: use the lockless ringbuffer

2020-07-03 Thread lijiang
在 2020年07月02日 21:31, Petr Mladek 写道:
> On Thu 2020-07-02 17:43:22, lijiang wrote:
>> 在 2020年07月02日 17:02, John Ogness 写道:
>>> On 2020-07-02, lijiang  wrote:
>>>> About the VMCOREINFO part, I made some tests based on the kernel patch
>>>> v3, the makedumpfile and crash-utility can work as expected with your
>>>> patch(userspace patch), but, unfortunately, the vmcore-dmesg(kexec-tools)
>>>> can't correctly read the printk ring buffer information, and get the
>>>> following error:
>>>>
>>>> "Missing the log_buf symbol"
>>>>
>>>> The kexec-tools(vmcore-dmesg) should also have a similar patch, just like
>>>> in the makedumpfile and crash-utility.
>>>
>>> Yes, a patch for this is needed (as well as for any other related
>>> software floating around the internet).
>>>
>>> I have no RFC patches for vmcore-dmesg. Looking at the code, I think it
>>> would be quite straight forward to port the makedumpfile patch. I will
>>
>> Yes, it should be a similar patch.
>>
>>> try to make some time for this.
>>>
>> That would be nice. Thank you, John Ogness.
>>
>>> I do not want to patch any other software for this. I think with 3
>>> examples (crash, makedumpfile, vmcore-dmesg), others should be able to
>>
>> It's good enough to have the patch for the makedumpfile, crash and 
>> vmcore-dmesg,
>> which can ensure the kdump(userspace) work well.
> 
> I agree that this three are the most important ones and should be
> enough.
> 
> Thanks a lot for working on it and testing it.
> 
My pleasure. I will test the vmcore-dmesg later.

Thanks.
Lianbo

> Best Regards,
> Petr
> 



Re: [PATCH v3 3/3] printk: use the lockless ringbuffer

2020-07-02 Thread lijiang
在 2020年07月02日 17:02, John Ogness 写道:
> On 2020-07-02, lijiang  wrote:
>> About the VMCOREINFO part, I made some tests based on the kernel patch
>> v3, the makedumpfile and crash-utility can work as expected with your
>> patch(userspace patch), but, unfortunately, the vmcore-dmesg(kexec-tools)
>> can't correctly read the printk ring buffer information, and get the
>> following error:
>>
>> "Missing the log_buf symbol"
>>
>> The kexec-tools(vmcore-dmesg) should also have a similar patch, just like
>> in the makedumpfile and crash-utility.
> 
> Yes, a patch for this is needed (as well as for any other related
> software floating around the internet).
> 
> I have no RFC patches for vmcore-dmesg. Looking at the code, I think it
> would be quite straight forward to port the makedumpfile patch. I will

Yes, it should be a similar patch.

> try to make some time for this.
> 
That would be nice. Thank you, John Ogness.

> I do not want to patch any other software for this. I think with 3
> examples (crash, makedumpfile, vmcore-dmesg), others should be able to

It's good enough to have the patch for the makedumpfile, crash and vmcore-dmesg,
which can ensure the kdump(userspace) work well.

Thanks.
Lianbo

> implement the changes to their software without needing my help.
> 
> John Ogness
> 



Re: [PATCH v3 3/3] printk: use the lockless ringbuffer

2020-07-02 Thread lijiang
Hi, John Ogness

About the VMCOREINFO part, I made some tests based on the kernel patch
v3, the makedumpfile and crash-utility can work as expected with your
patch(userspace patch), but, unfortunately, the vmcore-dmesg(kexec-tools)
can't correctly read the printk ring buffer information, and get the
following error:

"Missing the log_buf symbol"

The kexec-tools(vmcore-dmesg) should also have a similar patch, just like
in the makedumpfile and crash-utility.

BTW: If you already have a patch for the kexec-tools, would you mind sharing
it? I will make a test for the vmcore-dmesg.

Thanks.
Lianbo

在 2020年06月18日 22:49, John Ogness 写道:
> Replace the existing ringbuffer usage and implementation with
> lockless ringbuffer usage. Even though the new ringbuffer does not
> require locking, all existing locking is left in place. Therefore,
> this change is purely replacing the underlining ringbuffer.
> 
> Changes that exist due to the ringbuffer replacement:
> 
> - The VMCOREINFO has been updated for the new structures.
> 
> - Dictionary data is now stored in a separate data buffer from the
>   human-readable messages. The dictionary data buffer is set to the
>   same size as the message buffer. Therefore, the total required
>   memory for both dictionary and message data is
>   2 * (2 ^ CONFIG_LOG_BUF_SHIFT) for the initial static buffers and
>   2 * log_buf_len (the kernel parameter) for the dynamic buffers.
> 
> - Record meta-data is now stored in a separate array of descriptors.
>   This is an additional 72 * (2 ^ (CONFIG_LOG_BUF_SHIFT - 5)) bytes
>   for the static array and 72 * (log_buf_len >> 5) bytes for the
>   dynamic array.
> 
> Signed-off-by: John Ogness 
> ---
>  include/linux/kmsg_dump.h |   2 -
>  kernel/printk/printk.c| 944 --
>  2 files changed, 497 insertions(+), 449 deletions(-)
> 
> diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
> index 3378bcbe585e..c9b0abe5ca91 100644
> --- a/include/linux/kmsg_dump.h
> +++ b/include/linux/kmsg_dump.h
> @@ -45,8 +45,6 @@ struct kmsg_dumper {
>   bool registered;
>  
>   /* private state of the kmsg iterator */
> - u32 cur_idx;
> - u32 next_idx;
>   u64 cur_seq;
>   u64 next_seq;
>  };
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 8c14835be46c..7642ef634956 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -55,6 +55,7 @@
>  #define CREATE_TRACE_POINTS
>  #include 
>  
> +#include "printk_ringbuffer.h"
>  #include "console_cmdline.h"
>  #include "braille.h"
>  #include "internal.h"
> @@ -294,30 +295,24 @@ enum con_msg_format_flags {
>  static int console_msg_format = MSG_FORMAT_DEFAULT;
>  
>  /*
> - * The printk log buffer consists of a chain of concatenated variable
> - * length records. Every record starts with a record header, containing
> - * the overall length of the record.
> + * The printk log buffer consists of a sequenced collection of records, each
> + * containing variable length message and dictionary text. Every record
> + * also contains its own meta-data (@info).
>   *
> - * The heads to the first and last entry in the buffer, as well as the
> - * sequence numbers of these entries are maintained when messages are
> - * stored.
> + * Every record meta-data carries the timestamp in microseconds, as well as
> + * the standard userspace syslog level and syslog facility. The usual kernel
> + * messages use LOG_KERN; userspace-injected messages always carry a matching
> + * syslog facility, by default LOG_USER. The origin of every message can be
> + * reliably determined that way.
>   *
> - * If the heads indicate available messages, the length in the header
> - * tells the start next message. A length == 0 for the next message
> - * indicates a wrap-around to the beginning of the buffer.
> + * The human readable log message of a record is available in @text, the
> + * length of the message text in @text_len. The stored message is not
> + * terminated.
>   *
> - * Every record carries the monotonic timestamp in microseconds, as well as
> - * the standard userspace syslog level and syslog facility. The usual
> - * kernel messages use LOG_KERN; userspace-injected messages always carry
> - * a matching syslog facility, by default LOG_USER. The origin of every
> - * message can be reliably determined that way.
> - *
> - * The human readable log message directly follows the message header. The
> - * length of the message text is stored in the header, the stored message
> - * is not terminated.
> - *
> - * Optionally, a message can carry a dictionary of properties (key/value 
> pairs),
> - * to provide userspace with a machine-readable message context.
> + * Optionally, a record can carry a dictionary of properties (key/value
> + * pairs), to provide userspace with a machine-readable message context. The
> + * length of the dictionary is available in @dict_len. The dictionary is not
> + * terminated.
>   *
>   * Examples for 

Re: [PATCH v2] kexec: Do not verify the signature without the lockdown or mandatory signature

2020-06-18 Thread lijiang
在 2020年06月18日 03:37, Andrew Morton 写道:
> On Tue,  2 Jun 2020 12:59:52 +0800 Lianbo Jiang  wrote:
> 
>> Signature verification is an important security feature, to protect
>> system from being attacked with a kernel of unknown origin. Kexec
>> rebooting is a way to replace the running kernel, hence need be
>> secured carefully.
> 
> I'm finding this changelog quite hard to understand,
> 
Thanks for your comment.

I will improve the patch log and try to make it easily understand.

>> In the current code of handling signature verification of kexec kernel,
>> the logic is very twisted. It mixes signature verification, IMA signature
>> appraising and kexec lockdown.
>>
>> If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of
>> signature, the supported crypto, and key, we don't think this is wrong,
> 
> I think this is saying that in the absence of KEXEC_SIG_FORCE and if
> the signature/crypto/key are all incorrect, the kexec still succeeds,
> but it should not.
> 
When the KEXEC_SIG_FORCE is not enabled, even if kexec kernel image doesn't
have the signature, or the key, etc, kexec should be still allowed to loaded,
unless kexec lockdown is executed.

>> Unless kexec lockdown is executed. IMA is considered as another kind of
>> signature appraising method.
>>
>> If kexec kernel image has signature/crypto/key, it has to go through the
>> signature verification and pass. Otherwise it's seen as verification
>> failure, and won't be loaded.
> 
> I don't know if this is describing the current situation or the
> post-patch situation.
> 
This is the current situation, and we'd like to change it so that kexec allows
the kernel and initrd images to be loaded when they are not the lockdown or 
mandatory signature.

>> Seems kexec kernel image with an unqualified signature is even worse than
>> those w/o signature at all, this sounds very unreasonable. E.g. If people
>> get a unsigned kernel to load, or a kernel signed with expired key, which
>> one is more dangerous?
>>
>> So, here, let's simplify the logic to improve code readability. If the
>> KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification
>> is mandated. Otherwise, we lift the bar for any kernel image.
> 
> I think the whole thing needs a rewrite.  Start out by fully describing
> the current situation.  THen describe what is wrong with it, and why. 
> Then describe the proposed change.  Or something along these lines.
> 
> The changelog should also make clear the end-user impact of the patch. 
> In sufficient detail for others to decide which kernel version(s)
> should be patched.  Your recommendations will also be valuable - which
> kernel version(s) do you think should be patched, and why?
> 

Currently, kernel will always verify the signature without the lockdown or
mandatory signature. This may prevent the kernel from loading the kernel and
initrd images via the kexec_file_load() syscall. However, we'd like to allow
to still load the images in such case rather than failure due to the signature
verification issue.

For example, at the stage of development and test, usually use a signature
key to test whether the procedure of signature can work well as expected.
Sometimes, the signing time may be expired, but still use the kernel with
the old signature key to reproduce some problems in some automatic tests,
which always caused the failure of loading images.

Let's clean the logic of kernel code and allow to still load the kernel and
initrd images without the lockdown or mandatory signature.


Hope this helps.

Thanks.
Lianbo



Re: [PATCH v2] kexec: Do not verify the signature without the lockdown or mandatory signature

2020-06-10 Thread lijiang


I just noticed that I forgot to add Eric Biederman in cc list, so sorry for 
this.

Thanks.
Lianbo

在 2020年06月02日 12:59, Lianbo Jiang 写道:
> Signature verification is an important security feature, to protect
> system from being attacked with a kernel of unknown origin. Kexec
> rebooting is a way to replace the running kernel, hence need be
> secured carefully.
> 
> In the current code of handling signature verification of kexec kernel,
> the logic is very twisted. It mixes signature verification, IMA signature
> appraising and kexec lockdown.
> 
> If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of
> signature, the supported crypto, and key, we don't think this is wrong,
> Unless kexec lockdown is executed. IMA is considered as another kind of
> signature appraising method.
> 
> If kexec kernel image has signature/crypto/key, it has to go through the
> signature verification and pass. Otherwise it's seen as verification
> failure, and won't be loaded.
> 
> Seems kexec kernel image with an unqualified signature is even worse than
> those w/o signature at all, this sounds very unreasonable. E.g. If people
> get a unsigned kernel to load, or a kernel signed with expired key, which
> one is more dangerous?
> 
> So, here, let's simplify the logic to improve code readability. If the
> KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification
> is mandated. Otherwise, we lift the bar for any kernel image.
> 
> Signed-off-by: Lianbo Jiang 
> ---
> Changes since v1:
> [1] Modify the log level(suggested by Jiri Bohac)
> 
>  kernel/kexec_file.c | 34 ++
>  1 file changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index faa74d5f6941..fae496958a68 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -181,34 +181,19 @@ void kimage_file_post_load_cleanup(struct kimage *image)
>  static int
>  kimage_validate_signature(struct kimage *image)
>  {
> - const char *reason;
>   int ret;
>  
>   ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
>  image->kernel_buf_len);
> - switch (ret) {
> - case 0:
> - break;
> + if (ret) {
>  
> - /* Certain verification errors are non-fatal if we're not
> -  * checking errors, provided we aren't mandating that there
> -  * must be a valid signature.
> -  */
> - case -ENODATA:
> - reason = "kexec of unsigned image";
> - goto decide;
> - case -ENOPKG:
> - reason = "kexec of image with unsupported crypto";
> - goto decide;
> - case -ENOKEY:
> - reason = "kexec of image with unavailable key";
> - decide:
>   if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE)) {
> - pr_notice("%s rejected\n", reason);
> + pr_notice("Enforced kernel signature verification 
> failed (%d).\n", ret);
>   return ret;
>   }
>  
> - /* If IMA is guaranteed to appraise a signature on the kexec
> + /*
> +  * If IMA is guaranteed to appraise a signature on the kexec
>* image, permit it even if the kernel is otherwise locked
>* down.
>*/
> @@ -216,17 +201,10 @@ kimage_validate_signature(struct kimage *image)
>   security_locked_down(LOCKDOWN_KEXEC))
>   return -EPERM;
>  
> - return 0;
> -
> - /* All other errors are fatal, including nomem, unparseable
> -  * signatures and signature check failures - even if signatures
> -  * aren't required.
> -  */
> - default:
> - pr_notice("kernel signature verification failed (%d).\n", ret);
> + pr_debug("kernel signature verification failed (%d).\n", ret);
>   }
>  
> - return ret;
> + return 0;
>  }
>  #endif
>  
> 



Re: [PATCH] kexec: Do not verify the signature without the lockdown or mandatory signature

2020-05-26 Thread lijiang
在 2020年05月27日 11:15, lijiang 写道:
> 在 2020年05月26日 21:59, Jiri Bohac 写道:
>> On Mon, May 25, 2020 at 01:23:51PM +0800, Lianbo Jiang wrote:
>>> So, here, let's simplify the logic to improve code readability. If the
>>> KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification
>>> is mandated. Otherwise, we lift the bar for any kernel image.
>>
>> I agree completely; in fact that was my intention when
>> introducing the code, but I got overruled about the return codes:
>> https://lore.kernel.org/lkml/20180119125425.l72meyyc2qtrr...@dwarf.suse.cz/
>>
>> I like this simplification very much, except this part:
>>
>>> +   if (ret) {
>>> +   pr_debug("kernel signature verification failed (%d).\n", ret);
>>
>> ...
>>
>>> -   pr_notice("kernel signature verification failed (%d).\n", ret);
>>
>> I think the log level should stay at most PR_NOTICE when the
>> verification failure results in rejecting the kernel. Perhaps
>> even lower.
>>
> 
> Thank you for the comment, Jiri Bohac.
> 
> I like the idea of staying at most PR_NOTICE, but the pr_notice() will output
> some messages that kernel could want to ignore, such as the case you mentioned
> below.
> 
>> In case verification is not enforced and the failure is
>> ignored, KERN_DEBUG seems reasonable.
>>
> 
> Yes, good understanding. It seems that the pr_debug() is still a good option 
> here?
> Any other thoughts?
> 

Or the following change looks better? What's your opinion?

static int
kimage_validate_signature(struct kimage *image)
{
int ret;

ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
   image->kernel_buf_len);
if (ret) {

if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE)) {
pr_notice("Enforced kernel signature verification 
failed (%d).\n", ret);
return ret;
}

/*
 * If IMA is guaranteed to appraise a signature on the kexec
 * image, permit it even if the kernel is otherwise locked
 * down.
 */
if (!ima_appraise_signature(READING_KEXEC_IMAGE) &&
security_locked_down(LOCKDOWN_KEXEC))
return -EPERM;

pr_debug("kernel signature verification failed (%d).\n", ret);
}

return 0;
}


Thanks.
Lianbo

> Thanks.
> Lianbo
> 
> 
>> Regards,
>>



Re: [PATCH] kexec: Do not verify the signature without the lockdown or mandatory signature

2020-05-26 Thread lijiang
在 2020年05月26日 21:59, Jiri Bohac 写道:
> On Mon, May 25, 2020 at 01:23:51PM +0800, Lianbo Jiang wrote:
>> So, here, let's simplify the logic to improve code readability. If the
>> KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification
>> is mandated. Otherwise, we lift the bar for any kernel image.
> 
> I agree completely; in fact that was my intention when
> introducing the code, but I got overruled about the return codes:
> https://lore.kernel.org/lkml/20180119125425.l72meyyc2qtrr...@dwarf.suse.cz/
> 
> I like this simplification very much, except this part:
> 
>> +if (ret) {
>> +pr_debug("kernel signature verification failed (%d).\n", ret);
> 
> ...
> 
>> -pr_notice("kernel signature verification failed (%d).\n", ret);
> 
> I think the log level should stay at most PR_NOTICE when the
> verification failure results in rejecting the kernel. Perhaps
> even lower.
> 

Thank you for the comment, Jiri Bohac.

I like the idea of staying at most PR_NOTICE, but the pr_notice() will output
some messages that kernel could want to ignore, such as the case you mentioned
below.

> In case verification is not enforced and the failure is
> ignored, KERN_DEBUG seems reasonable.
> 

Yes, good understanding. It seems that the pr_debug() is still a good option 
here?
Any other thoughts?

Thanks.
Lianbo


> Regards,
> 



Re: s390x: kdump kernel can not boot if I load kernel and initrd images via the kexec_file_load syscall.

2020-05-12 Thread lijiang
在 2020年05月13日 01:39, Philipp Rudo 写道:
> Hi Lianbo,
> 
> stupid me obviously never tested the kdump+initrd combination...
> 
> The patch below fixed the problem for me. Could please give it a try, too.
> 

Thank you for the patch, Philipp. Kdump kernel can boot on s390x machine with 
this patch.

> Thanks
> Philipp
> 
> ---
> 
> From 3f77088c9139582261d2e3ee6476324fc1ded401 Mon Sep 17 00:00:00 2001
> From: Philipp Rudo 
> Date: Tue, 12 May 2020 19:25:14 +0200
> Subject: [PATCH] s390/kexec_file: fix initrd location for kdump kernel
> 
> initrd_start must not point at the location the initrd is loaded into
> the crashkernel memory but at the location it will be after the
> crashkernel memory is swapped with the memory at 0.
> 
> Fixes: ee337f5469fd ("s390/kexec_file: Add crash support to image loader")
> Reported-by: Lianbo Jiang 
> Signed-off-by: Philipp Rudo 
> ---
>  arch/s390/kernel/machine_kexec_file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/s390/kernel/machine_kexec_file.c 
> b/arch/s390/kernel/machine_kexec_file.c
> index 8415ae7d2a23..f9e4baa64b67 100644
> --- a/arch/s390/kernel/machine_kexec_file.c
> +++ b/arch/s390/kernel/machine_kexec_file.c
> @@ -151,7 +151,7 @@ static int kexec_file_add_initrd(struct kimage *image,
>   buf.mem += crashk_res.start;
>   buf.memsz = buf.bufsz;
>  
> - data->parm->initrd_start = buf.mem;
> + data->parm->initrd_start = data->memsz;

Good findings.

>   data->parm->initrd_size = buf.memsz;
>   data->memsz += buf.memsz;
>  
> 

Tested-by: Lianbo Jiang 

Thanks.
Lianbo



Re: s390x: kdump kernel can not boot if I load kernel and initrd images via the kexec_file_load syscall.

2020-05-12 Thread lijiang
There may be more people who would like to care about this issue. So I added
them to cc list.

Thanks.
Lianbo

在 2020年05月12日 17:47, lijiang 写道:
> Also added Dave Young to the cc list. Thanks.
> 
> 在 2020年05月12日 10:52, lijiang 写道:
>> 在 2020年05月11日 23:01, Philipp Rudo 写道:
>>> Hi Lianbo,
>>>
>> Thank you for this reply, Philipp.
>>
>>> one more question. Does the same problem occur withe the kexec_load syscall,
>>> i.e. option '-c' instead of '-s'?
>>>
>> No, kdump kernel can boot with the kexec_load syscal option '-c'.
>>
>> Currently, I only found kdump kernel can not boot with the kexec_file_load 
>> syscall(option '-s').
>>
>>> Thanks
>>> Philipp
>>>
>>> On Mon, 11 May 2020 11:15:58 +0200
>>> Philipp Rudo  wrote:
>>>
>>>> Hi Lianbo,
>>>>
>>>> I believe that your crashkernel memory is simply too small. Pretty much at 
>>>> the
>>>> beginning of the kernel log you have
>>>>
>>>>> [0.070468] setup: The initial RAM disk does not fit into the memory
>>>>
>>>> Although I must say 256M should be enough for most purposes...
>>>>
>>>> Could you please retry with a bigger crashkernel memory?
>>>>
>>
>> I increased the size of crash memory to 512M(crashkernel=512M), kdump kernel 
>> still can
>> not boot, there is a same issue.
>>
>> I added some debug information in the arch/s390/kernel/setup.c, and got the 
>> following logs:
>>
>> [0.070885] Linux version 5.7.0-rc5+ 
>> (r...@ibm-z-124.rhts.eng.bos.redhat.com)
>>  (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC), GNU ld version 
>> 2.30-73.el8
>> ) #3 SMP Mon May 11 10:28:57 EDT 2020
>>
>> [0.070888] setup: Linux is running as a z/VM guest operating system in 
>> 64-bi
>> t mode   
>>
>> [0.071125] lijiang-debug initrd_start:4aeeb000 size:17180900
>> <--
>> [0.071128] setup: The maximum memory size is 2048MB  
>>
>> [0.071130] cma: Reserved 4 MiB at 0x1fc0 
>>
>> [0.071131] setup: The initial RAM disk does not fit into the memory  
>>
>> [0.071132] lijiang-debug: check_initrd 810 start:4aeeb000, size:17180900 
>>  <--  
>> [0.099765] cpu: 2 configured CPUs, 0 standby CPUs  
>>
>> The size of initrd is 17M, the 512M memory should be enough. I could suspect 
>> that kdump
>> kernel doesn't find an appropriate memory block, thereby this causes the 
>> failure.
>>
>> The compressed initrd is really decompressed in the unpack_to_rootfs().
>>
>> I have a s390 machine with 2cpus and 2G memory, which is too slow. :-)
>>
>>
>> Thanks.
>> Lianbo
>>
>>
>>>> Thanks
>>>> Philipp
>>>>
>>>>
>>>> On Fri, 8 May 2020 18:45:56 +0800
>>>> lijiang  wrote:
>>>>
>>>>> Hi, Philipp Rudo
>>>>>
>>>>> Sorry to disturb you. I ran into a problem on s390 machine, can you help 
>>>>> to have a look?
>>>>>
>>>>> Kdump kernel can not boot on s390x machines if I load the kernel and 
>>>>> initrd images with the kexec_file_load() syscall as below:
>>>>>
>>>>> #kexec -s -p /boot//boot/vmlinuz-5.7.0-rc4+ 
>>>>> --initrd=/boot/initramfs-5.7.0-rc4+kdump.img 
>>>>> --command-line="rd.dasd=0.0.0120 rd.dasd=0.0.0121 rd.dasd=0.0.0122 
>>>>> rd.dasd=0.0.0123 
>>>>> rd.znet=qeth,0.0.8000,0.0.8001,0.0.8002,layer2=1,portname=z-126,portno=0 
>>>>> $tuned_params BOOT_IMAGE=0 nr_cpus=1 cgroup_disable=memory numa=off 
>>>>> udev.children-max=2 panic=10 rootflags=nofail transparent_hugepage=never 
>>>>> novmcoredd nokaslr"
>>>>>
>>>>> But the kexec reboot can work well if I use the kexec_file_load() syscall 
>>>>> as follow:
>>>>>
>>>>> #kexec -s -l  /boot//boot/vmlinuz-5.7.0-rc4+ 
>>>>> --initrd=/boot/initramfs-5.7.0-rc4+kdump.img 
>>>>> --command-line="root=/dev/mapper/rhel_ibm--z--126-root crashkernel=256M 
>>>>> rd.dasd=0.0.0120 rd.dasd=0.0.0121 rd.dasd=0.0.0122 rd.dasd=0.0.0123 
>>>>> rd.lvm.lv=rhel_i

Re: [PATCH 3/3 v3] x86/kdump: clean up all the code related to the backup region

2019-10-14 Thread lijiang
在 2019年10月12日 20:16, Dave Young 写道:
> Hi Eric,
> 
> On 10/12/19 at 06:26am, Eric W. Biederman wrote:
>> Lianbo Jiang  writes:
>>
>>> When the crashkernel kernel command line option is specified, the
>>> low 1MiB memory will always be reserved, which makes that the memory
>>> allocated later won't fall into the low 1MiB area, thereby, it's not
>>> necessary to create a backup region and also no need to copy the first
>>> 640k content to a backup region.
>>>
>>> Currently, the code related to the backup region can be safely removed,
>>> so lets clean up.
>>>
>>> Signed-off-by: Lianbo Jiang 
>>> ---
>>
>>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>>> index eb651fbde92a..cc5774fc84c0 100644
>>> --- a/arch/x86/kernel/crash.c
>>> +++ b/arch/x86/kernel/crash.c
>>> @@ -173,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>>>  
>>>  #ifdef CONFIG_KEXEC_FILE
>>>  
>>> -static unsigned long crash_zero_bytes;
>>> -
>>>  static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
>>>  {
>>> unsigned int *nr_ranges = arg;
>>> @@ -234,9 +232,15 @@ static int prepare_elf64_ram_headers_callback(struct 
>>> resource *res, void *arg)
>>>  {
>>> struct crash_mem *cmem = arg;
>>>  
>>> -   cmem->ranges[cmem->nr_ranges].start = res->start;
>>> -   cmem->ranges[cmem->nr_ranges].end = res->end;
>>> -   cmem->nr_ranges++;
>>> +   if (res->start >= SZ_1M) {
>>> +   cmem->ranges[cmem->nr_ranges].start = res->start;
>>> +   cmem->ranges[cmem->nr_ranges].end = res->end;
>>> +   cmem->nr_ranges++;
>>> +   } else if (res->end > SZ_1M) {
>>> +   cmem->ranges[cmem->nr_ranges].start = SZ_1M;
>>> +   cmem->ranges[cmem->nr_ranges].end = res->end;
>>> +   cmem->nr_ranges++;
>>> +   }
>>
>> What is going on with this chunk?  I can guess but this needs a clear
>> comment.
> 
> Indeed it needs some code comment, this is based on some offline
> discussion.  cat /proc/vmcore will give a warning because ioremap is
> mapping the system ram.
> 
> We pass the first 1M to kdump kernel in e820 as system ram so that 2nd
> kernel can use the low 1M memory because for example the trampoline
> code.
> 
Thank you, Eric and Dave. I will add the code comment as below if it would be 
OK.

@@ -234,9 +232,20 @@ static int prepare_elf64_ram_headers_callback(struct 
resource *res, void *arg)
 {
struct crash_mem *cmem = arg;
 
-   cmem->ranges[cmem->nr_ranges].start = res->start;
-   cmem->ranges[cmem->nr_ranges].end = res->end;
-   cmem->nr_ranges++;
+   /*
+* Currently, pass the low 1MiB range to kdump kernel in e820
+* as system ram so that kdump kernel can also use the low 1MiB
+* memory due to the real mode trampoline code.
+* And later, the low 1MiB range will be exclued from elf header,
+* which will avoid remapping the 1MiB system ram when dumping
+* vmcore.
+*/
+   if (res->start >= SZ_1M) {
+   cmem->ranges[cmem->nr_ranges].start = res->start;
+   cmem->ranges[cmem->nr_ranges].end = res->end;
+   cmem->nr_ranges++;
+   } else if (res->end > SZ_1M) {
+   cmem->ranges[cmem->nr_ranges].start = SZ_1M;
+   cmem->ranges[cmem->nr_ranges].end = res->end;
+   cmem->nr_ranges++;
+   }
 
return 0;
 }

>>
>>>  
>>> return 0;
>>>  }
>>
>>> @@ -356,9 +337,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
>>> struct boot_params *params)
>>> memset(, 0, sizeof(struct crash_memmap_data));
>>> cmd.params = params;
>>>  
>>> -   /* Add first 640K segment */
>>> -   ei.addr = image->arch.backup_src_start;
>>> -   ei.size = image->arch.backup_src_sz;
>>> +   /*
>>> +* Add the low memory range[0x1000, SZ_1M], skip
>>> +* the first zero page.
>>> +*/
>>> +   ei.addr = PAGE_SIZE;
>>> +   ei.size = SZ_1M - PAGE_SIZE;
>>> ei.type = E820_TYPE_RAM;
>>> add_e820_entry(params, );
>>
>> Likewise here.  Why do we need a special case?
>> Why the magic with PAGE_SIZE?
> 
> Good catch, the zero page part is useless, I think no other special
> reason, just assumed zero page is not usable, but it should be ok to
> remove the special handling, just pass 0 - 1M is good enough.
>> Thanks
> Dave
> 


crash: `kmem -s` reported "kmem: dma-kmalloc-512: slab: ffffe192c0001000 invalid freepointer: e5ffef4e9a040b7e" on a dumped vmcore

2019-08-01 Thread lijiang
Hi, Tom

Recently, i ran into a problem about SME and used crash tool to check the 
vmcore as follow: 

crash> kmem -s | grep -i invalid
kmem: dma-kmalloc-512: slab: e192c0001000 invalid freepointer: 
e5ffef4e9a040b7e
kmem: dma-kmalloc-512: slab: e192c0001000 invalid freepointer: 
e5ffef4e9a040b7e

And the crash tool reported the above error, probably, the main reason is that 
kernel does not
correctly handle the first 640k region when SME is enabled.

When SME is enabled, the kernel and initramfs images are loaded into the 
decrypted memory, and
the backup area(first 640k) is also mapped as decrypted, but the first 640k 
data is copied to
the backup area in purgatory(). Please refer to this file: 
arch/x86/purgatory/purgatory.c
..
static int copy_backup_region(void)
{
if (purgatory_backup_dest) {
memcpy((void *)purgatory_backup_dest,
   (void *)purgatory_backup_src, purgatory_backup_sz);
}
return 0;
}
..

arch/x86/kernel/machine_kexec_64.c
..
machine_kexec_prepare()->
arch_update_purgatory()->
.

Actually, the firs 640k area is encrypted in the first kernel when SME is 
enabled, here kernel
copies the first 640k data to the backup area in purgatory(), because the 
backup area is mapped
as decrypted, this copying operation makes that the first 640k data is 
decrypted(decoded) and
saved to the backup area, but probably kernel can not aware of SME in 
purgatory(), which causes
kernel mistakenly read out the first 640k.

In addition, i hacked kernel code as follow:

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 7bcc92add72c..a51631d36a7a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -377,6 +378,16 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, 
loff_t *fpos,
m->offset + m->size - *fpos,
buflen);
start = m->paddr + *fpos - m->offset;
+   if (m->paddr == 0x73f6) {//the backup area's start 
address:0x73f6
+   tmp = read_from_oldmem(buffer, tsz, ,
+   userbuf, false);
+   } else
tmp = read_from_oldmem(buffer, tsz, ,
   userbuf, mem_encrypt_active());
if (tmp < 0)

Here, i used the crash tool to check the vmcore, i can see that the backup area 
is decrypted,
except for the dma-kmalloc-512. So i suspect that kernel did not correctly read 
out the first
640k data to backup area. Do you happen to know how to deal with the first 640k 
area in purgatory()
when SME is enabled? Any idea?

BTW: I' curious the reason why the address of dma-kmalloc-512k always falls 
into the first 640k
region, and i did not see the same issue on another machine.

Machine:
Serial Number   diesel-sys9079-0001
Model   AMD Diesel (A0C)
CPU AMD EPYC 7601 32-Core Processor


Background:
On x86_64, the first 640k region is special because of some historical reasons. 
And kdump kernel will
reuse the first 640k region, so kernel will back up(copy) the first 640k region 
to a backup area in
purgatory(), in order not to rewrite the old region(640k) in kdump kernel, 
which makes sure that kdump
can read out the old memory from vmcore.


Thanks.
Lianbo


Re: linux-next: manual merge of the hmm tree with the tip tree

2019-07-04 Thread lijiang


> On Thu, Jul 04, 2019 at 07:03:52PM +1000, Stephen Rothwell wrote:
>> Hi all,
>>
>> Today's linux-next merge of the hmm tree got a conflict in:
>>
>>   include/linux/ioport.h
>>
>> between commit:
>>
>>   ae9e13d621d6 ("x86/e820, ioport: Add a new I/O resource descriptor 
>> IORES_DESC_RESERVED")
>>   5da04cc86d12 ("x86/mm: Rework ioremap resource mapping determination")
>>
>> from the tip tree and commit:
>>
>>   25b2995a35b6 ("mm: remove MEMORY_DEVICE_PUBLIC support")
>>
>> from the hmm tree.
>>
>> I fixed it up (see below) and can carry the fix as necessary. This
>> is now fixed as far as linux-next is concerned, but any non trivial
>> conflicts should be mentioned to your upstream maintainer when your tree
>> is submitted for merging.  You may also want to consider cooperating
>> with the maintainer of the conflicting tree to minimise any particularly
>> complex conflicts.
>>
>> -- 
>> Cheers,
>> Stephen Rothwell
>>
>> diff --cc include/linux/ioport.h
>> index 5db386cfc2d4,a02b290ca08a..
>> --- a/include/linux/ioport.h
>> +++ b/include/linux/ioport.h
>> @@@ -133,16 -132,6 +133,15 @@@ enum 
>>  IORES_DESC_PERSISTENT_MEMORY= 4,
>>  IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
>>  IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
>> -IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
>> -IORES_DESC_RESERVED = 8,
>> ++   IORES_DESC_RESERVED = 7,
>>  +};

This change should be OK. Thanks.

Lianbo

>>  +
>>  +/*
>>  + * Flags controlling ioremap() behavior.
>>  + */
>>  +enum {
>>  +   IORES_MAP_SYSTEM_RAM= BIT(0),
>>  +   IORES_MAP_ENCRYPTED = BIT(1),
>>   };
>>   
>>   /* helpers to define resources */
> 
> Looks OK to me, thanks
> 
> Jason
> 


Re: [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved

2019-06-16 Thread lijiang
After applied the patch series(v2), the kexec-d kernel and the kdump kernel can
successfully boot.

Thanks.

Tested-by: Lianbo Jiang 

在 2019年06月15日 05:15, Lendacky, Thomas 写道:
> The memory occupied by the kernel is reserved using memblock_reserve()
> in setup_arch(). Currently, the area is from symbols _text to __bss_stop.
> Everything after __bss_stop must be specifically reserved otherwise it
> is discarded. This is not clearly documented.
> 
> Add a new symbol, __end_of_kernel_reserve, that more readily identifies
> what is reserved, along with comments that indicate what is reserved,
> what is discarded and what needs to be done to prevent a section from
> being discarded.
> 
> Cc: Baoquan He 
> Cc: Lianbo Jiang 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/sections.h | 2 ++
>  arch/x86/kernel/setup.c | 8 +++-
>  arch/x86/kernel/vmlinux.lds.S   | 9 -
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
> index 8ea1cfdbeabc..71b32f2570ab 100644
> --- a/arch/x86/include/asm/sections.h
> +++ b/arch/x86/include/asm/sections.h
> @@ -13,4 +13,6 @@ extern char __end_rodata_aligned[];
>  extern char __end_rodata_hpage_align[];
>  #endif
>  
> +extern char __end_of_kernel_reserve[];
> +
>  #endif   /* _ASM_X86_SECTIONS_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 08a5f4a131f5..32eb70625b3b 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -827,8 +827,14 @@ dump_kernel_offset(struct notifier_block *self, unsigned 
> long v, void *p)
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> + /*
> +  * Reserve the memory occupied by the kernel between _text and
> +  * __end_of_kernel_reserve symbols. Any kernel sections after the
> +  * __end_of_kernel_reserve symbol must be explicity reserved with a
> +  * separate memblock_reserve() or it will be discarded.
> +  */
>   memblock_reserve(__pa_symbol(_text),
> -  (unsigned long)__bss_stop - (unsigned long)_text);
> +  (unsigned long)__end_of_kernel_reserve - (unsigned 
> long)_text);
>  
>   /*
>* Make sure page 0 is always reserved because on systems with
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 0850b5149345..ca2252ca6ad7 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -368,6 +368,14 @@ SECTIONS
>   __bss_stop = .;
>   }
>  
> + /*
> +  * The memory occupied from _text to here, __end_of_kernel_reserve, is
> +  * automatically reserved in setup_arch(). Anything after here must be
> +  * explicitly reserved using memblock_reserve() or it will be discarded
> +  * and treated as available memory.
> +  */
> + __end_of_kernel_reserve = .;
> +
>   . = ALIGN(PAGE_SIZE);
>   .brk : AT(ADDR(.brk) - LOAD_OFFSET) {
>   __brk_base = .;
> @@ -382,7 +390,6 @@ SECTIONS
>   STABS_DEBUG
>   DWARF_DEBUG
>  
> - /* Sections to be discarded */
>   DISCARDS
>   /DISCARD/ : {
>   *(.eh_frame)
> 


When SME is enabled on Dell PowerEdge R7425(AMD) machine, the first kernel can not successfully boot because of the megaraid_sas failure

2019-06-14 Thread lijiang
Hi,

On the Dell PowerEdge R7425(AMD) machine, when SME is enabled, the first kernel 
can not successfully boot because of
the following failure:
..
[  211.950273] megaraid_sas :61:00.0: Init cmd return status FAILED for 
SCSI host 0
[  211.982750] megaraid_sas :61:00.0: Failed from megasas_init_fw 5900
..

Please refer to the kernel log.

Thanks.
Lianbo


Kernel log:

[0.00] Linux version 5.2.0-rc4+ 
(r...@dell-per7425-02.khw.lab.eng.bos.redhat.com) (gcc version 8.3.1 20190507 
(Red Hat 8.3.1-4) (GCC)) #1 SMP Fri Jun 14 23:28:09 EDT 2019
[0.00] Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.2.0-rc4+ 
root=/dev/mapper/rhel_dell--per7425--02-root ro crashkernel=auto 
resume=/dev/mapper/rhel_dell--per7425--02-swap 
rd.lvm.lv=rhel_dell-per7425-02/root rd.lvm.lv=rhel_dell-per7425-02/swap 
console=ttyS0,115200n81 mem_encrypt=on
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'compacted' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0008efff] usable
[0.00] BIOS-e820: [mem 0x0008f000-0x0008] ACPI NVS
[0.00] BIOS-e820: [mem 0x0009-0x0009] usable
[0.00] BIOS-e820: [mem 0x0010-0x6cacefff] usable
[0.00] BIOS-e820: [mem 0x6cacf000-0x6efcefff] reserved
[0.00] BIOS-e820: [mem 0x6efcf000-0x6fdfefff] ACPI NVS
[0.00] BIOS-e820: [mem 0x6fdff000-0x6fffefff] ACPI data
[0.00] BIOS-e820: [mem 0x6000-0x6fff] usable
[0.00] BIOS-e820: [mem 0x7000-0x8fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfed80fff] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00107eff] usable
[0.00] BIOS-e820: [mem 0x00107f00-0x00107fff] reserved
[0.00] NX (Execute Disable) protection: active
[0.00] extended physical RAM map:
[0.00] reserve setup_data: [mem 0x-0x0008efff] 
usable
[0.00] reserve setup_data: [mem 0x0008f000-0x0008] 
ACPI NVS
[0.00] reserve setup_data: [mem 0x0009-0x0009] 
usable
[0.00] reserve setup_data: [mem 0x0010-0x49e2401f] 
usable
[0.00] reserve setup_data: [mem 0x49e24020-0x49e8345f] 
usable
[0.00] reserve setup_data: [mem 0x49e83460-0x49e8401f] 
usable
[0.00] reserve setup_data: [mem 0x49e84020-0x49ee345f] 
usable
[0.00] reserve setup_data: [mem 0x49ee3460-0x62c7801f] 
usable
[0.00] reserve setup_data: [mem 0x62c78020-0x62c8f45f] 
usable
[0.00] reserve setup_data: [mem 0x62c8f460-0x62c9001f] 
usable
[0.00] reserve setup_data: [mem 0x62c90020-0x62c9805f] 
usable
[0.00] reserve setup_data: [mem 0x62c98060-0x62c9901f] 
usable
[0.00] reserve setup_data: [mem 0x62c99020-0x62cfb65f] 
usable
[0.00] reserve setup_data: [mem 0x62cfb660-0x62cfc01f] 
usable
[0.00] reserve setup_data: [mem 0x62cfc020-0x62d5e65f] 
usable
[0.00] reserve setup_data: [mem 0x62d5e660-0x6cacefff] 
usable
[0.00] reserve setup_data: [mem 0x6cacf000-0x6efcefff] 
reserved
[0.00] reserve setup_data: [mem 0x6efcf000-0x6fdfefff] 
ACPI NVS
[0.00] reserve setup_data: [mem 0x6fdff000-0x6fffefff] 
ACPI data
[0.00] reserve setup_data: [mem 0x6000-0x6fff] 
usable
[0.00] reserve setup_data: [mem 0x7000-0x8fff] 
reserved
[0.00] reserve setup_data: [mem 0xfec1-0xfec10fff] 
reserved
[0.00] reserve setup_data: [mem 0xfed8-0xfed80fff] 
reserved
[0.00] reserve setup_data: [mem 0x0001-0x00107eff] 
usable
[0.00] reserve setup_data: [mem 0x00107f00-0x00107fff] 
reserved
[0.00] efi: EFI v2.50 by Dell Inc.
[0.00] efi:  ACPI=0x6fffe000  ACPI 2.0=0x6fffe014  SMBIOS=0x6eab3000  
SMBIOS 3.0=0x6eab1000 
[0.00] SMBIOS 3.0.0 present.
[0.00] DMI: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.7.6 01/14/2019
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 1996.202 MHz processor
[

Re: [PATCH] x86/mm: Create an SME workarea in the kernel for early encryption

2019-06-13 Thread lijiang
Hi,

After applied this patch, i made a test on the following machines. It works 
well.

[a] HPE ProLiant DL385 Gen10 (878612-B21) AMD EPYC 7251 8-Core Processor

[b] Dell PowerEdge R7425 AMD EPYC 7401 24-Core Processor

[c] AMD Speedway AMD Eng Sample: 2S1405A3VIHF4_28/14_N

[d] AMD Ethanol X AMD Eng Sample: 2S1402E2VJUG2_20/14_N

On the [a] machine, it works well in the crashkernel=160M case, but need
to remove the CONFIG_DEBUG_INFO from .config, otherwise it could happen the
OOM. And it also works well in all >= 256M(crashkernel) cases(no need to
remove the above option from .config).

On the [b][c][d] machine, it also works fine in the crashkernel=256M cases,
also need to remove the debug info, otherwise it could happen the OOM. And
it also works well in all >= 320M(crashkernel) cases.

Thanks.


Tested-by: Lianbo Jiang 


在 2019年06月12日 21:32, Lendacky, Thomas 写道:
> The SME workarea used during early encryption of the kernel during boot
> is situated on a 2MB boundary after the end of the kernel text, data,
> etc. sections (_end).  This works well during initial boot of a compressed
> kernel because of the relocation used for decompression of the kernel.
> But when performing a kexec boot, there's a chance that the SME workarea
> may not be mapped by the kexec pagetables or that some of the other data
> used by kexec could exist in this range.
> 
> Create a section for SME in the vmlinux.lds.S.  Position it after "_end"
> so that the memory will be reclaimed during boot and, since it is all
> zeroes, it compresses well.  Since this new section will be part of the
> kernel, kexec will account for it in pagetable mappings and placement of
> data after the kernel.
> 
> Here's an example of a kernel size without and with the SME section:
>   without:
>   vmlinux:36,501,616
>   bzImage: 6,497,344
> 
>   1-47f37 : System RAM
> 1e400-1e47677d4 : Kernel code (0x7677d4)
> 1e47677d5-1e4e2e0bf : Kernel data (0x6c68ea)
> 1e5074000-1e5372fff : Kernel bss  (0x2fefff)
> 
>   with:
>   vmlinux:44,419,408
>   bzImage: 6,503,136
> 
>   88000-c7ff7 : System RAM
> 8cf00-8cf7677d4 : Kernel code (0x7677d4)
> 8cf7677d5-8cfe2e0bf : Kernel data (0x6c68ea)
> 8d0074000-8d0372fff : Kernel bss  (0x2fefff)
> 
> Cc: Baoquan He 
> Cc: Lianbo Jiang 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/kernel/vmlinux.lds.S  | 16 
>  arch/x86/mm/mem_encrypt_identity.c | 22 --
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 0850b5149345..8c4377983e54 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -379,6 +379,22 @@ SECTIONS
>   . = ALIGN(PAGE_SIZE);   /* keep VO_INIT_SIZE page aligned */
>   _end = .;
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + /*
> +  * SME workarea section: Lives outside of the kernel proper
> +  * (_text - _end) for performing in-place encryption. Resides
> +  * on a 2MB boundary to simplify the pagetable setup used for
> +  * the encryption.
> +  */
> + . = ALIGN(HPAGE_SIZE);
> + .sme : AT(ADDR(.sme) - LOAD_OFFSET) {
> + __sme_begin = .;
> + *(.sme)
> + . = ALIGN(HPAGE_SIZE);
> + __sme_end = .;
> + }
> +#endif
> +
>   STABS_DEBUG
>   DWARF_DEBUG
>  
> diff --git a/arch/x86/mm/mem_encrypt_identity.c 
> b/arch/x86/mm/mem_encrypt_identity.c
> index 4aa9b1480866..c55c2ec8fb12 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -73,6 +73,19 @@ struct sme_populate_pgd_data {
>   unsigned long vaddr_end;
>  };
>  
> +/*
> + * This work area lives in the .sme section, which lives outside of
> + * the kernel proper. It is sized to hold the intermediate copy buffer
> + * and more than enough pagetable pages.
> + *
> + * By using this section, the kernel can be encrypted in place and we
> + * avoid any possibility of boot parameters or initramfs images being
> + * placed such that the in-place encryption logic overwrites them.  This
> + * section is 2MB aligned to allow for simple pagetable setup using only
> + * PMD entries (see vmlinux.lds.S).
> + */
> +static char sme_workarea[2 * PMD_PAGE_SIZE] __section(.sme);
> +
>  static char sme_cmdline_arg[] __initdata = "mem_encrypt";
>  static char sme_cmdline_on[]  __initdata = "on";
>  static char sme_cmdline_off[] __initdata = "off";
> @@ -314,8 +327,13 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
>   }
>  #endif
>  
> - /* Set the encryption workarea to be immediately after the kernel */
> - workarea_start = kernel_end;
> + /*
> +  * We're running identity mapped, so we 

Re: [PATCH v2] scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask in pqi_pci_init()

2019-05-29 Thread lijiang
在 2019年05月30日 10:00, Martin K. Petersen 写道:
> 
> Lianbo,
> 
>> When SME is enabled, the smartpqi driver won't work on the HP DL385
>> G10 machine, which causes the failure of kernel boot because it fails
>> to allocate pqi error buffer. Please refer to the kernel log:
> 
> Applied to 5.2/scsi-fixes, thanks!
> 
OK, thank you, Martin K.

And also thanks to Tom and Don for helping review this patch.

Lianbo


Re: [PATCH] scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask in pqi_pci_init()

2019-05-24 Thread lijiang
在 2019年05月24日 06:55, don.br...@microchip.com 写道:
> -Original Message-
> From: linux-scsi-ow...@vger.kernel.org 
> [mailto:linux-scsi-ow...@vger.kernel.org] On Behalf Of Lendacky, Thomas
> Sent: Thursday, May 23, 2019 9:45 AM
> To: Lianbo Jiang ; linux-kernel@vger.kernel.org
> Cc: don.br...@microsemi.com; j...@linux.ibm.com; martin.peter...@oracle.com; 
> linux-s...@vger.kernel.org; esc.storage...@microsemi.com; dyo...@redhat.com
> Subject: Re: [PATCH] scsi: smartpqi: properly set both the DMA mask and the 
> coherent DMA mask in pqi_pci_init()
> 
> On 5/23/19 12:52 AM, Lianbo Jiang wrote:
>> When SME is enabled, the smartpqi driver won't work on the HP DL385
>> G10 machine, which causes the failure of kernel boot because it fails 
>> to allocate pqi error buffer. Please refer to the kernel log:
>> 
>> [9.431749] usbcore: registered new interface driver uas
>> [9.441524] Microsemi PQI Driver (v1.1.4-130)
>> [9.442956] i40e :04:00.0: fw 6.70.48768 api 1.7 nvm 10.2.5
>> [9.447237] smartpqi :23:00.0: Microsemi Smart Family Controller found
>>  Starting dracut initqueue hook...
>> [  OK  ] Started Show Plymouth Boot Scre[9.471654] Broadcom 
>> NetXtreme-C/E driver bnxt_en v1.9.1
>> en.
>> [  OK  ] Started Forward Password Requests to Plymouth Directory Watch.
>> [[0;[9.487108] smartpqi :23:00.0: failed to allocate PQI error buffer
>> 
>> [  139.050544] dracut-initqueue[949]: Warning: dracut-initqueue 
>> timeout - starting timeout scripts [  139.589779] 
>> dracut-initqueue[949]: Warning: dracut-initqueue timeout - starting 
>> timeout scripts
>>
>> For correct operation, lets call the dma_set_mask_and_coherent() to 
>> properly set the mask for both streaming and coherent, in order to 
>> inform the kernel about the devices DMA addressing capabilities.
> 
> You should probably expand on this a bit...  Basically, the fact that the 
> coherent DMA mask value wasn't set caused the driver to fall back to SWIOTLB 
> when SME is active.

Thank you, Tom.

> I'm not sure if the failure was from running out of SWIOTLB or exceeding the 
> maximum allocation size for SWIOTLB
If so, it should print some messages like "swiotlb buffer is full", but i did 
not get such a log.

> I believe the fix is proper, but I'll let the driver owner comment on that.
> 
> Thanks,
> Tom
> 
> Acked-by: Don Brace 
> Tested-by: Don Brace 
> 
> Please add the extra description suggested by Thomas.
> 
OK, i will add Tom's description to patch log and post again.

Thank you, Don.

Lianbo
> 
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  drivers/scsi/smartpqi/smartpqi_init.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/smartpqi/smartpqi_init.c 
>> b/drivers/scsi/smartpqi/smartpqi_init.c
>> index c26cac819f9e..8b1fde6c7dab 100644
>> --- a/drivers/scsi/smartpqi/smartpqi_init.c
>> +++ b/drivers/scsi/smartpqi/smartpqi_init.c
>> @@ -7282,7 +7282,7 @@ static int pqi_pci_init(struct pqi_ctrl_info 
>> *ctrl_info)
>> else
>> mask = DMA_BIT_MASK(32);
>>
>> -   rc = dma_set_mask(_info->pci_dev->dev, mask);
>> +   rc = dma_set_mask_and_coherent(_info->pci_dev->dev, 
>> + mask);
>> if (rc) {
>> dev_err(_info->pci_dev->dev, "failed to set DMA 
>> mask\n");
>> goto disable_device;
>> --
>> 2.17.1
>>


Re: [PATCH 2/3 v3] x86/kexec: Set the C-bit in the identity map page table when SEV is active

2019-05-15 Thread lijiang
在 2019年05月15日 21:30, Borislav Petkov 写道:
> On Tue, Apr 30, 2019 at 03:44:20PM +0800, Lianbo Jiang wrote:
>> When SEV is active, the second kernel image is loaded into the
>> encrypted memory. Lets make sure that when kexec builds the
>> identity mapping page table it adds the memory encryption mask(C-bit).
>>
>> Co-developed-by: Brijesh Singh 
>> Signed-off-by: Brijesh Singh 
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/kernel/machine_kexec_64.c | 12 +++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c 
>> b/arch/x86/kernel/machine_kexec_64.c
>> index f60611531d17..11fe352f7344 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -56,6 +56,7 @@ static int init_transition_pgtable(struct kimage *image, 
>> pgd_t *pgd)
>>  pte_t *pte;
>>  unsigned long vaddr, paddr;
>>  int result = -ENOMEM;
>> +pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
>>  
>>  vaddr = (unsigned long)relocate_kernel;
>>  paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
>> @@ -92,7 +93,11 @@ static int init_transition_pgtable(struct kimage *image, 
>> pgd_t *pgd)
>>  set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>>  }
>>  pte = pte_offset_kernel(pmd, vaddr);
>> -set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
>> +
>> +if (sev_active())
>> +prot = PAGE_KERNEL_EXEC;
>> +
>> +set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
>>  return 0;
>>  err:
>>  return result;
>> @@ -129,6 +134,11 @@ static int init_pgtable(struct kimage *image, unsigned 
>> long start_pgtable)
>>  level4p = (pgd_t *)__va(start_pgtable);
>>  clear_page(level4p);
>>  
>> +if (sev_active()) {
>> +info.page_flag |= _PAGE_ENC;
>> +info.kernpg_flag = _KERNPG_TABLE;
> 
> kernpg_flag above is initialized to _KERNPG_TABLE_NOENC so you can do here
> 
>   info.kernpg_flag |= _PAGE_ENC;
> 
> too, to make it even more clear what this does, right?
> 
OK, i will modify it according to your suggestion and post again.

Thanks.
Lianbo

> IOW:
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 783ce5184405..16c37fe489bc 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -135,8 +135,8 @@ static int init_pgtable(struct kimage *image, unsigned 
> long start_pgtable)
> clear_page(level4p);
>  
> if (sev_active()) {
> -   info.page_flag |= _PAGE_ENC;
> -   info.kernpg_flag = _KERNPG_TABLE;
> +   info.page_flag   |= _PAGE_ENC;
> +   info.kernpg_flag |= _PAGE_ENC;
> }
>  
> if (direct_gbpages)
> 
> 


Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2019-01-25 Thread lijiang
在 2018年12月05日 05:33, Lendacky, Thomas 写道:
> On 11/29/2018 09:37 PM, Dave Young wrote:
>> + more people
>>
>> On 11/29/18 at 04:09pm, Lianbo Jiang wrote:
>>> When doing kexec_file_load, the first kernel needs to pass the e820
>>> reserved ranges to the second kernel. But kernel can not exactly
>>> match the e820 reserved ranges when walking through the iomem resources
>>> with the descriptor 'IORES_DESC_NONE', because several e820 types(
>>> e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
>>> _TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
>>> may pass these four types to the kdump kernel, that is not desired result.
>>>
>>> So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
>>> for the iomem resources search interfaces. It is helpful to exactly
>>> match the reserved resource ranges when walking through iomem resources.
>>>
>>> In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
>>> these code originally related to the descriptor 'IORES_DESC_NONE' need to
>>> be updated. Otherwise, it will be easily confused and also cause some
>>> errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
>>> descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
>>> changed.
>>>
>>> Suggested-by: Dave Young 
>>> Signed-off-by: Lianbo Jiang 
>>> ---
>>>  arch/ia64/kernel/efi.c |  4 
>>>  arch/x86/kernel/e820.c |  2 +-
>>>  arch/x86/mm/ioremap.c  | 13 -
>>>  include/linux/ioport.h |  1 +
>>>  kernel/resource.c  |  6 +++---
>>>  5 files changed, 21 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
>>> index 8f106638913c..1841e9b4db30 100644
>>> --- a/arch/ia64/kernel/efi.c
>>> +++ b/arch/ia64/kernel/efi.c
>>> @@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource 
>>> *code_resource,
>>> break;
>>>  
>>> case EFI_RESERVED_TYPE:
>>> +   name = "reserved";
>>
>> Ingo updated X86 code to use "Reserved",  I think it would be good to do
>> same for this case as well
>>
>>> +   desc = IORES_DESC_RESERVED;
>>> +   break;
>>> +
>>> case EFI_RUNTIME_SERVICES_CODE:
>>> case EFI_RUNTIME_SERVICES_DATA:
>>> case EFI_ACPI_RECLAIM_MEMORY:
>>
>> Originally, above 3 are all "reserved", so probably they all should be
>> IORES_DESC_RESERVED.
>>
>> Can any IA64 people to review this?
>>
>>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>>> index 50895c2f937d..57fafdafb860 100644
>>> --- a/arch/x86/kernel/e820.c
>>> +++ b/arch/x86/kernel/e820.c
>>> @@ -1048,10 +1048,10 @@ static unsigned long __init 
>>> e820_type_to_iores_desc(struct e820_entry *entry)
>>> case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
>>> case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
>>> case E820_TYPE_PRAM:return 
>>> IORES_DESC_PERSISTENT_MEMORY_LEGACY;
>>> +   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
>>> case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
>>> case E820_TYPE_RAM: /* Fall-through: */
>>> case E820_TYPE_UNUSABLE:/* Fall-through: */
>>> -   case E820_TYPE_RESERVED:/* Fall-through: */
>>> default:return IORES_DESC_NONE;
>>> }
>>>  }
>>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>>> index 5378d10f1d31..fea2ef99415d 100644
>>> --- a/arch/x86/mm/ioremap.c
>>> +++ b/arch/x86/mm/ioremap.c
>>> @@ -83,7 +83,18 @@ static bool __ioremap_check_ram(struct resource *res)
>>>  
>>>  static int __ioremap_check_desc_other(struct resource *res)
>>>  {
>>> -   return (res->desc != IORES_DESC_NONE);
>>> +   /*
>>> +* But now, the 'E820_TYPE_RESERVED' type is converted to the new
>>> +* descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE',
>>> +* it has been changed. And the value of 'mem_flags.desc_other'
>>> +* is equal to 'true' if we don't strengthen the condition in this
>>> +* function, that is wrong. Because originally it is equal to
>>> +* 'false' for the same reserved type.
>>> +*
>>> +* So, that would be nice to keep it the same as before.
>>> +*/
>>> +   return ((res->desc != IORES_DESC_NONE) &&
>>> +   (res->desc != IORES_DESC_RESERVED));
>>>  }
>>
>> Added Tom since he added the check function.  Is it possible to only
>> check explict valid desc types instead of exclude IORES_DESC_NONE?
> 
> Sorry for the delay...
> 
> The original intent of the check was to map most memory as encrypted under
> SEV if it was marked with a specific descriptor, since it was likely to
> not be MMIO. I tried converting most things that mapped memory to memremap
> vs ioremap, but ACPI was one area that I left alone and this check catches
> the mapping of the ACPI tables. I 

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-15 Thread lijiang
在 2019年01月15日 04:49, Dave Anderson 写道:
> 
> 
> - Original Message -
>> On Mon, Jan 14, 2019 at 03:26:32PM -0500, Dave Anderson wrote:
>>> No.  It needs *both* the vmlinux file and the vmcore file in order to read
>>> kernel
>>> virtual memory, so just having a kernel virtual address is insufficient.
>>>
>>> So it's a chicken-and-egg situation.  This particular --osrelease option is 
>>> used
>>> to determine *what* vmlinux file would be required for an actual crash 
>>> analysis
>>> session.
>>
>> Ok, that makes sense. I could've used that explanation when reviewing
>> the documentation. Do you mind skimming through this:
>>
>> https://lkml.kernel.org/r/2019045650.gg4...@zn.tnic
>>
>> in case we've missed explaining relevant usage - like that above - of
>> some of the vmcoreinfo members?
> 
> Yeah, I've been watching the thread, and the document looks fine to me.
> It's just that when I saw the discussion of this one being removed that
> I felt the need to respond...  ;-)

Thank you for explaining this issue. 

Regards,
Lianbo

> 
> Dave
> 


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-15 Thread lijiang
在 2019年01月14日 17:15, Borislav Petkov 写道:
> On Mon, Jan 14, 2019 at 01:30:30PM +0800, lijiang wrote:
>> I noticed that the checkpatch was coded in Perl. But i am not familiar with
>> the Perl program language, that would be beyond my ability to do this, i have
>> to learn the Perl program language step by step. :-)
> 
> You could give it a try - it is not hard :-)
> 

Thank you for encouraging me to do the things.

> And there's no hurry for this, take your time.
> 
OK. I would like to put this task in my queue. Once i have free time, i will 
work
on this issue. 

>> Do you mean this one 'KERNEL_IMAGE_SIZE'?
> 
> I mean, all those which are unused. Optimally, you should look at the
> tools and see whether they're using those exports and if not, remove
> them. But no hurry here too, take your time.
> 
OK. I will check it again, But, basically they are used by Makedumpfile or 
Crash tools.


> My final goal is to have this up-to-date documentation of what is
> exported and what is used by user tools so that people can look at it
> first before carelessly exporting yet another thing.
> 
Agree.

Thanks,
Lianbo
> Thx.
> 


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-13 Thread lijiang
在 2019年01月11日 22:56, Borislav Petkov 写道:
> On Thu, Jan 10, 2019 at 08:19:43PM +0800, Lianbo Jiang wrote:
>> This document lists some variables that export to vmcoreinfo, and briefly
>> describles what these variables indicate. It should be instructive for
>> many people who do not know the vmcoreinfo.
>>
>> Suggested-by: Borislav Petkov 
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  Documentation/kdump/vmcoreinfo.txt | 500 +
>>  1 file changed, 500 insertions(+)
>>  create mode 100644 Documentation/kdump/vmcoreinfo.txt
> 
> Ok, below is what I'm going to commit if no one complains. I hope you'd
> find some time to work on adding the checkpatch check for patches which
> add vmcoreinfo members but do not document them

I noticed that the checkpatch was coded in Perl. But i am not familiar with
the Perl program language, that would be beyond my ability to do this, i have
to learn the Perl program language step by step. :-)

> and also remove those vmcoreinfo members which are unused.
> 

Do you mean this one 'KERNEL_IMAGE_SIZE'?

Currently unused by Makedumpfile, but used to compute the module virtual
address by Crash.

I have corrected this issue in VMCOREINFO doc.

Thanks.
Lianbo

> Which should be easy because we don't have to be backwards-compatible
> with makedumpfile as this is not an ABI.
> 
> Thx.
> 
> ---
> From: Lianbo Jiang 
> Date: Thu, 10 Jan 2019 20:19:43 +0800
> Subject: [PATCH] kdump: Document kernel data exported in the vmcoreinfo note
> 
> Document data exported in vmcoreinfo and briefly describe its use by
> userspace tools.a
> 
>  [ bp: heavily massage and redact the text. ]
> 
> Suggested-by: Borislav Petkov 
> Signed-off-by: Lianbo Jiang 
> Signed-off-by: Borislav Petkov 
> Cc: Andrew Morton 
> Cc: Baoquan He 
> Cc: Dave Young 
> Cc: Jonathan Corbet 
> Cc: Thomas Gleixner 
> Cc: Vivek Goyal 
> Cc: ander...@redhat.com
> Cc: k-ha...@ab.jp.nec.com
> Cc: ke...@lists.infradead.org
> Cc: linux-...@vger.kernel.org
> Cc: mi...@redhat.com
> Cc: x86-ml 
> Link: https://lkml.kernel.org/r/20190110121944.6050-2-liji...@redhat.com
> ---
>  Documentation/kdump/vmcoreinfo.txt | 494 +
>  1 file changed, 494 insertions(+)
>  create mode 100644 Documentation/kdump/vmcoreinfo.txt
> 
> diff --git a/Documentation/kdump/vmcoreinfo.txt 
> b/Documentation/kdump/vmcoreinfo.txt
> new file mode 100644
> index ..2dc3797940a3
> --- /dev/null
> +++ b/Documentation/kdump/vmcoreinfo.txt
> @@ -0,0 +1,494 @@
> +
> + VMCOREINFO
> +
> +
> +===
> +What is it?
> +===
> +
> +VMCOREINFO is a special ELF note section. It contains various
> +information from the kernel like structure size, page size, symbol
> +values, field offsets, etc. These data are packed into an ELF note
> +section and used by user-space tools like crash and makedumpfile to
> +analyze a kernel's memory layout.
> +
> +
> +Common variables
> +
> +
> +init_uts_ns.name.release
> +
> +
> +The version of the Linux kernel. Used to find the corresponding source
> +code from which the kernel has been built.
> +
> +PAGE_SIZE
> +-
> +
> +The size of a page. It is the smallest unit of data used by the memory
> +management facilities. It is usually 4096 bytes of size and a page is
> +aligned on 4096 bytes. Used for computing page addresses.
> +
> +init_uts_ns
> +---
> +
> +The UTS namespace which is used to isolate two specific elements of the
> +system that relate to the uname(2) system call. It is named after the
> +data structure used to store information returned by the uname(2) system
> +call.
> +
> +User-space tools can get the kernel name, host name, kernel release
> +number, kernel version, architecture name and OS type from it.
> +
> +node_online_map
> +---
> +
> +An array node_states[N_ONLINE] which represents the set of online nodes
> +in a system, one bit position per node number. Used to keep track of
> +which nodes are in the system and online.
> +
> +swapper_pg_dir
> +-
> +
> +The global page directory pointer of the kernel. Used to translate
> +virtual to physical addresses.
> +
> +_stext
> +--
> +
> +Defines the beginning of the text section. In general, _stext indicates
> +the kernel start address. Used to convert a virtual address from the
> +direct kernel map to a physical address.
> +
> +vmap_area_list
> +--
> +
> +Stores the virtual area list. makedumpfile gets the vmalloc start value
> +from this variable and its value is necessary for vmalloc translation.
> +
> +mem_map
> +---
> +
> +Physical addresses are translated to struct pages by treating them as
> +an index into the mem_map array. Right-shifting a physical address
> +PAGE_SHIFT bits converts it into a page frame number which is an index
> +into that 

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-13 Thread lijiang
在 2019年01月11日 20:33, Borislav Petkov 写道:
> On Thu, Jan 10, 2019 at 08:19:43PM +0800, Lianbo Jiang wrote:
>> +init_uts_ns.name.release
>> +
>> +
>> +The version of the Linux kernel. Used to find the corresponding source
>> +code from which the kernel has been built.
>> +
> 
> ...
> 
>> +
>> +init_uts_ns
>> +---
>> +
>> +This is the UTS namespace, which is used to isolate two specific
>> +elements of the system that relate to the uname(2) system call. The UTS
>> +namespace is named after the data structure used to store information
>> +returned by the uname(2) system call.
>> +
>> +User-space tools can get the kernel name, host name, kernel release
>> +number, kernel version, architecture name and OS type from it.
> 
> Already asked this but no reply so lemme paste my question again:
> 
> "And this document already fulfills its purpose - those two vmcoreinfo
> exports are redundant and the first one can be removed.
> 
> And now that we agreed that VMCOREINFO is not an ABI and is very tightly
> coupled to the kernel version, init_uts_ns.name.release can be removed,
> yes?
> 
> Or is there anything speaking against that?"
> 

Sorry for this, that is my mistake. Thanks for your reminder.

I agree on your point of view. But i forgot that i should remove this variable
in this post.

I would like to remove this variable and post again. 

Thanks.
Lianbo


Re: [PATCH 2/2 v5] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

2019-01-07 Thread lijiang
在 2019年01月07日 10:29, Baoquan He 写道:
> On 01/07/19 at 09:47am, Lianbo Jiang wrote:
>> For AMD machine with SME feature, makedumpfile tools need to know
>> whether the crash kernel was encrypted or not. If SME is enabled
>   ^ crashed
>> in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
>^ crashed
>> contains the memory encryption mask, so need to remove the sme mask
> "makedumpfile needs" or
>in makedumpfile need to remove...

Thanks for your comment.

This will be modified in next post.

Regards,
Lianbo

>> to obtain the true physical address.
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/kernel/machine_kexec_64.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c 
>> b/arch/x86/kernel/machine_kexec_64.c
>> index 4c8acdfdc5a7..bc4108096b18 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -352,10 +352,13 @@ void machine_kexec(struct kimage *image)
>>  
>>  void arch_crash_save_vmcoreinfo(void)
>>  {
>> +u64 sme_mask = sme_me_mask;
>> +
>>  VMCOREINFO_NUMBER(phys_base);
>>  VMCOREINFO_SYMBOL(init_top_pgt);
>>  vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
>>  pgtable_l5_enabled());
>> +VMCOREINFO_NUMBER(sme_mask);
>>  
>>  #ifdef CONFIG_NUMA
>>  VMCOREINFO_SYMBOL(node_data);
>> -- 
>> 2.17.1
>>


Re: [PATCH 1/2 v5] kdump: add the vmcoreinfo documentation

2019-01-07 Thread lijiang
在 2019年01月07日 15:55, Hatayama, Daisuke 写道:
> Hi,
> 
>> -Original Message-
>> From: linux-kernel-ow...@vger.kernel.org
>> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Lianbo Jiang
>> Sent: Monday, January 7, 2019 10:48 AM
>> To: linux-kernel@vger.kernel.org
>> Cc: ke...@lists.infradead.org; t...@linutronix.de; mi...@redhat.com;
>> b...@alien8.de; x...@kernel.org; a...@linux-foundation.org; b...@redhat.com;
>> dyo...@redhat.com; linux-...@vger.kernel.org; k-ha...@ab.jp.nec.com;
>> ander...@redhat.com
>> Subject: [PATCH 1/2 v5] kdump: add the vmcoreinfo documentation
>>
>> This document lists some variables that export to vmcoreinfo, and briefly
>> describles what these variables indicate. It should be instructive for
>> many people who do not know the vmcoreinfo, and it also normalizes the
> 
> I agree to this part, but
> 
>> exported variables as a convention between kernel and use-space.
> 
> I don't agree to this part.
> 
> The meaning of each symbol is decided by each feature in the kernel,
> not by vmcoreinfo. I suspect anyone mistakenly understand this document is
> ABI enforcing each symbol works as described. We can change symbols and
> their meaning regardless of this document.
> 
> Oh, I found this topic has already been discussed at v3, and
> you removed "ABI" in the patch description at v4.
> 
> But it seems still confusing to me.
> I think the explicit description saying that this is for user-land tools,
> they treats each symbol as described,
> and the document never affect implementation of each kernel components,
> is necessary in e.g. "Purpose of this document" section?
> 

Thanks for your advice.

If this part could make the document become a rope tied around our necks, i 
would
like to remove this part from the patch log in next post.

Regards,
Lianbo

>>
>> Suggested-by: Borislav Petkov 
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  Documentation/kdump/vmcoreinfo.txt | 500 +
>>  1 file changed, 500 insertions(+)
>>  create mode 100644 Documentation/kdump/vmcoreinfo.txt
>>
>> diff --git a/Documentation/kdump/vmcoreinfo.txt
>> b/Documentation/kdump/vmcoreinfo.txt
>> new file mode 100644
>> index ..8e444586b87b
>> --- /dev/null
>> +++ b/Documentation/kdump/vmcoreinfo.txt
>> @@ -0,0 +1,500 @@
>> +
>> +VMCOREINFO
>> +
>> +
>> +===
>> +What is the VMCOREINFO?
>> +===
>> +
>> +VMCOREINFO is a special ELF note section. It contains various
>> +information from the kernel like structure size, page size, symbol
>> +values, field offsets, etc. These data are packed into an ELF note
>> +section and used by user-space tools like crash and makedumpfile to
>> +analyze a kernel's memory layout.
>> +
>> +
>> +Common variables
>> +
>> +
>> +init_uts_ns.name.release
>> +
>> +
>> +The version of the Linux kernel. Used to find the corresponding source
>> +code from which the kernel has been built.
>> +
>> +PAGE_SIZE
>> +-
>> +
>> +The size of a page. It is the smallest unit of data for memory
>> +management in kernel. It is usually 4096 bytes and a page is aligned
>> +on 4096 bytes. Used for computing page addresses.
>> +
>> +init_uts_ns
>> +---
>> +
>> +This is the UTS namespace, which is used to isolate two specific
>> +elements of the system that relate to the uname(2) system call. The UTS
>> +namespace is named after the data structure used to store information
>> +returned by the uname(2) system call.
>> +
>> +User-space tools can get the kernel name, host name, kernel release
>> +number, kernel version, architecture name and OS type from it.
>> +
>> +node_online_map
>> +---
>> +
>> +An array node_states[N_ONLINE] which represents the set of online node
>> +in a system, one bit position per node number. Used to keep track of
>> +which nodes are in the system and online.
>> +
>> +swapper_pg_dir
>> +-
>> +
>> +The global page directory pointer of the kernel. Used to translate
>> +virtual to physical addresses.
>> +
>> +_stext
>> +--
>> +
>> +Defines the beginning of the text section. In general, _stext indicates
>> +the kernel start address. Used to convert a virtual address from the
>> +direct kernel map to a physical address.
>> +
>> +vmap_area_list
>> +--
>> +
>> +Stores the virtual area list. makedumpfile can get the vmalloc start
>> +value from this variable. This value is necessary for vmalloc translation.
>> +
>> +mem_map
>> +---
>> +
>> +Physical addresses are translated to struct pages by treating them as
>> +an index into the mem_map array. Right-shifting a physical address
>> +PAGE_SHIFT bits converts it into a page frame number which is an index
>> +into that mem_map array.
>> +
>> +Used to map an address to the corresponding struct page.

Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2019-01-06 Thread lijiang
在 2018年12月07日 04:11, Borislav Petkov 写道:
> On Fri, Nov 30, 2018 at 03:04:44PM +0800, lijiang wrote:
>> I have noticed the changes on x86, but for IA64, i'm not sure whether it 
>> should do the same
>> thing, so keep it as before.
>>
>> If IA64 people would like to give any comment, that will be appreciated.
> 
> I think you should not touch ia64 and not make Tony unnecessarily power
> up the old rust.
> 

Ok, i will get rid of previous changes to IA64 in next post.

Thanks.

> :-)
> 


Re: [PATCH 1/2 v4] kdump: add the vmcoreinfo documentation

2019-01-03 Thread lijiang
在 2019年01月04日 03:28, Kazuhito Hagio 写道:
> Hi Lianbo,
> 
> -Original Message-
>> +===
>> +What is the VMCOREINFO?
>> +===
>> +
>> +VMCOREINFO is a special ELF note section. It contains various
>> +information from the kernel like structure size, page size, symbol
>> +values, field offsets, etc. These data are packed into an ELF note
>> +section and used by user-space tools like crash and makedumpfile to
>> +analyze a kernel's memory layout.
>> +
>> +To dump the VMCOREINFO contents, one can do:
>> +
>> +# makedumpfile -g VMCOREINFO -x vmlinux
> 
> again, this command does not dump the VMCOREINFO in ELF note section.
> It converts the vmlinux's debug infomation into a VMCOREINFO-like data.
> So I don't think this command is suitable to introduce here.
> 

Thank you, Kazu.

As you mentioned, makedumpfile in 'devel' branch can print VMCOREINFO in 
/proc/kcore,
can i add the following command to this document?

#makedumpfile --mem-usage /proc/kcore -D

>> +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|
>> +PG_hwpoision|PG_head_mask
>> +-
>> +
>> +Page attributes. These flags are used to filter free pages.
> 
> Some of these are not used to filter *free* pages, so
> 
> ... used to filter various unnecessary pages.
> 

Great. I will modify it in next post.

And also merge the 'PG_buddy' and 'PG_offline'  into the PG_* flag here.

Many thanks.

Lianbo

>> +PAGE_BUDDY_MAPCOUNT_VALUE or ~PG_buddy
>> +--
> 
> then, this can be merged into the one above?
> 
>> +==
>> +x86_64
>> +==
> ...
>> +PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)
>> +
> 
> This looks not only for x86_64, and also can be merged into
> the PG_* flags?
> 
> Thank you for your effort!
> Kazu
> 
> 


Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation

2018-12-25 Thread lijiang
在 2018年12月26日 11:36, Dave Young 写道:
> On 12/26/18 at 11:24am, Dave Young wrote:
> +
> +KERNEL_IMAGE_SIZE
> +=
> +The size of 'KERNEL_IMAGE_SIZE', currently unused.

 So remove?

>>>
>>> I'm not sure whether it should be removed, so i keep it.
>>
>> Just remove it.  It was added by Baoquan for KASLR issues, later
>> makedumpfile reverted the userspace part and added other implementation.
>>
>> In case old makedumpfile does not support new kernel, it has some kernel
>> versions support list in code, thus no worry about the compatibility
>> issue.
> 
> Ah, it is not unused actually, clone crash tool git:
> $ git grep KERNEL_IMAGE_SIZE
> x86_64.c:   if ((string = 
> pc->read_vmcoreinfo("NUMBER(KERNEL_IMAGE_SIZE)"))) {
> 
> So in the documentation, the use cases of crash tool should also be
> covered.
> 

Sure, maybe only this one was ignored.

I will improve this variable in the documentation.

> Lianbo, it would be good to cc Dave and Kazu for these patches, could
> you cc them in your next post?
> 

Yes, i will add Dave and Kazu, and also resend patch v4.

Thanks.

>>
>> Thanks
>> Dave
> 
> Thanks
> Dave
> 


Re: [PATCH 2/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

2018-12-17 Thread lijiang
在 2018年12月17日 21:01, Borislav Petkov 写道:
> On Sun, Dec 16, 2018 at 09:16:17PM +0800, Lianbo Jiang wrote:
>> For AMD machine with SME feature, makedumpfile tools need to know
>> whether the crash kernel was encrypted or not. If SME is enabled
>> in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
>> contains the memory encryption mask, so need to remove the sme mask
>> to obtain the true physical address.
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/kernel/machine_kexec_64.c | 14 ++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c 
>> b/arch/x86/kernel/machine_kexec_64.c
>> index 4c8acdfdc5a7..1860fe24117d 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -352,10 +352,24 @@ void machine_kexec(struct kimage *image)
>>  
>>  void arch_crash_save_vmcoreinfo(void)
>>  {
>> +u64 sme_mask = sme_me_mask;
>> +
>>  VMCOREINFO_NUMBER(phys_base);
>>  VMCOREINFO_SYMBOL(init_top_pgt);
>>  vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
>>  pgtable_l5_enabled());
>> +/*
>> + * Currently, the local variable 'sme_mask' stores the value of
>> + * sme_me_mask(bit 47), and also write the value of sme_mask to
>> + * the vmcoreinfo.
>> + * If need, the bit(sme_mask) might be redefined in the future,
>> + * but the 'bit63' will be reserved.
>> + * For example:
>> + * [ misc  ][ enc bit  ][ other misc SME info   ]
>> + * ____1000______..._
>> + * 63   59   55   51   47   43   39   35   31   27   ... 3
>> + */
> 
> This text belongs into the document.
> 
Ok, i will move it into VMCOREINFO document.

Thanks.


Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation

2018-12-17 Thread lijiang
在 2018年12月17日 21:00, Borislav Petkov 写道:
> On Sun, Dec 16, 2018 at 09:16:16PM +0800, Lianbo Jiang wrote:
>> +clear_idx
>> +=
>> +The index that the next printk record to read after the last 'clear'
>> +command. It indicates the first record after the last SYSLOG_ACTION
>> +_CLEAR, like issued by 'dmesg -c'.
> 
> What is that used for by the userspace tools?
> 

The clear_idx is used when dumping the dmesg log.

>> +
>> +log_next_idx
>> +
>> +The index of the next record to store in the buffer 'log_buf'. It helps
>> +to compute the index of current strings position.
>> +
>> +printk_log
>> +==
>> +The size of a structure 'printk_log'. It helps to compute the size of
>> +messages, and extract dmesg log.
> 
> What is the difference between that and log_buf?
> 

The printk_log is used to output human readable text, it will encapsulate header
information for log_buf, such as timestamp, syslog level, etc.

> 
> 
>> +
>> +(printk_log, ts_nsec|len|text_len|dict_len)
>> +===
>> +It represents these field offsets in the structure 'printk_log'. User
>> +space tools can parse it and detect any changes to structure down the
>> +line.
> 
> What does that mean? "any changes down the line"?
> 

User space tools can parse it and check whether the values of printk_log's 
members
have been changed. 

I will improve it in patch v4.


>> +
>> +(free_area.free_list, MIGRATE_TYPES)
>> +
>> +The number of migrate types for pages. The free_list is divided into
>> +the array, it needs to know the number of the array.
> 
> ... for?
> 

It needs to know the number of the array when makedumpfile computes the number 
of
free pages.

>> +
>> +NR_FREE_PAGES
>> +=
>> +On linux-2.6.21 or later, the number of free_pages is in
>> +vm_stat[NR_FREE_PAGES]. It can get the number of free pages from the
>> +array.
>> +
>> +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|
>> +PG_hwpoision|PG_head_mask
>> +=
>> +It means the attribute of a page. These flags will be used to filter
>> +the free pages.
>> +
>> +PAGE_BUDDY_MAPCOUNT_VALUE or ~PG_buddy
>> +==
>> +The 'PG_buddy' flag indicates that the page is free and in the buddy
>> +system. Makedumpfile can exclude the free pages managed by a buddy.
> 
> That text belongs with the one above?
> 

It exported the value of (~PG_buddy), so it is placed here independently.

>> +
>> +HUGETLB_PAGE_DTOR
>> +=
>> +The 'HUGETLB_PAGE_DTOR' flag indicates the hugetlbfs pages. Makedumpfile
>> +will exclude these pages.
>> +
>> +
>> +x86_64 variables
>> +
>> +
>> +phys_base
>> +=
>> +In x86_64, the 'phys_base' is necessary to convert virtual address of
>> +exported kernel symbol to physical address.
>> +
>> +init_top_pgt
>> +
>> +The 'init_top_pgt' used to walk through the whole page table and convert
>> +virtual address to physical address.
> 
> This is the same as swapper_pg_dir?
> 

These two variables are somewhat similar, but they are used in different 
scenarios.

>> +
>> +pgtable_l5_enabled
>> +==
>> +User-space tools need to know whether the crash kernel was in 5-level
>> +paging mode or not.
>> +
>> +node_data
>> +=
>> +This is a struct 'pglist_data' array, it stores all numa nodes information.
>> +In general, Makedumpfile can get the pglist_data structure from symbol
>> +'node_data'.
>> +
>> +(node_data, MAX_NUMNODES)
>> +=
>> +The number of this 'node_data' array. It means the maximum number of the
>> +nodes in system.
>> +
>> +KERNELOFFSET
>> +
>> +Randomize the address of the kernel image. This is the offset of KASLR in
>> +VMCOREINFO ELF notes. It is used to compute the page offset in x86_64. If
>> +KASLE is disabled, this value is zero.
>> +
>> +KERNEL_IMAGE_SIZE
>> +=
>> +The size of 'KERNEL_IMAGE_SIZE', currently unused.
> 
> So remove?
> 

I'm not sure whether it should be removed, so i keep it.

>> +
>> +The old MODULES_VADDR need be decided by KERNEL_IMAGE_SIZE when kaslr
>> +enabled. Now MODULES_VADDR is not needed any more since Pratyush makes
>> +all VA to PA converting done by page table lookup.
> 
> Also, I did clean this up considerably - please include in your next
> version:
> 

Great, thanks for you help. I will post v4 later.

Regards,
Lianbo

> ---
> diff --git a/Documentation/kdump/vmcoreinfo.txt 
> b/Documentation/kdump/vmcoreinfo.txt
> index d71260bf383a..2ce34d952bfd 100644
> --- a/Documentation/kdump/vmcoreinfo.txt
> +++ b/Documentation/kdump/vmcoreinfo.txt
> @@ -1,18 +1,19 @@
>  
> - Documentation for VMCOREINFO
> + VMCOREINFO
>  
>  
>  ===
>  What is the 

Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-12-11 Thread lijiang
在 2018年12月05日 05:33, Lendacky, Thomas 写道:
> On 11/29/2018 09:37 PM, Dave Young wrote:
>> + more people
>>
>> On 11/29/18 at 04:09pm, Lianbo Jiang wrote:
>>> When doing kexec_file_load, the first kernel needs to pass the e820
>>> reserved ranges to the second kernel. But kernel can not exactly
>>> match the e820 reserved ranges when walking through the iomem resources
>>> with the descriptor 'IORES_DESC_NONE', because several e820 types(
>>> e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
>>> _TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
>>> may pass these four types to the kdump kernel, that is not desired result.
>>>
>>> So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
>>> for the iomem resources search interfaces. It is helpful to exactly
>>> match the reserved resource ranges when walking through iomem resources.
>>>
>>> In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
>>> these code originally related to the descriptor 'IORES_DESC_NONE' need to
>>> be updated. Otherwise, it will be easily confused and also cause some
>>> errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
>>> descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
>>> changed.
>>>
>>> Suggested-by: Dave Young 
>>> Signed-off-by: Lianbo Jiang 
>>> ---
>>>  arch/ia64/kernel/efi.c |  4 
>>>  arch/x86/kernel/e820.c |  2 +-
>>>  arch/x86/mm/ioremap.c  | 13 -
>>>  include/linux/ioport.h |  1 +
>>>  kernel/resource.c  |  6 +++---
>>>  5 files changed, 21 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
>>> index 8f106638913c..1841e9b4db30 100644
>>> --- a/arch/ia64/kernel/efi.c
>>> +++ b/arch/ia64/kernel/efi.c
>>> @@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource 
>>> *code_resource,
>>> break;
>>>  
>>> case EFI_RESERVED_TYPE:
>>> +   name = "reserved";
>>
>> Ingo updated X86 code to use "Reserved",  I think it would be good to do
>> same for this case as well
>>
>>> +   desc = IORES_DESC_RESERVED;
>>> +   break;
>>> +
>>> case EFI_RUNTIME_SERVICES_CODE:
>>> case EFI_RUNTIME_SERVICES_DATA:
>>> case EFI_ACPI_RECLAIM_MEMORY:
>>
>> Originally, above 3 are all "reserved", so probably they all should be
>> IORES_DESC_RESERVED.
>>
>> Can any IA64 people to review this?
>>
>>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>>> index 50895c2f937d..57fafdafb860 100644
>>> --- a/arch/x86/kernel/e820.c
>>> +++ b/arch/x86/kernel/e820.c
>>> @@ -1048,10 +1048,10 @@ static unsigned long __init 
>>> e820_type_to_iores_desc(struct e820_entry *entry)
>>> case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
>>> case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
>>> case E820_TYPE_PRAM:return 
>>> IORES_DESC_PERSISTENT_MEMORY_LEGACY;
>>> +   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
>>> case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
>>> case E820_TYPE_RAM: /* Fall-through: */
>>> case E820_TYPE_UNUSABLE:/* Fall-through: */
>>> -   case E820_TYPE_RESERVED:/* Fall-through: */
>>> default:return IORES_DESC_NONE;
>>> }
>>>  }
>>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>>> index 5378d10f1d31..fea2ef99415d 100644
>>> --- a/arch/x86/mm/ioremap.c
>>> +++ b/arch/x86/mm/ioremap.c
>>> @@ -83,7 +83,18 @@ static bool __ioremap_check_ram(struct resource *res)
>>>  
>>>  static int __ioremap_check_desc_other(struct resource *res)
>>>  {
>>> -   return (res->desc != IORES_DESC_NONE);
>>> +   /*
>>> +* But now, the 'E820_TYPE_RESERVED' type is converted to the new
>>> +* descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE',
>>> +* it has been changed. And the value of 'mem_flags.desc_other'
>>> +* is equal to 'true' if we don't strengthen the condition in this
>>> +* function, that is wrong. Because originally it is equal to
>>> +* 'false' for the same reserved type.
>>> +*
>>> +* So, that would be nice to keep it the same as before.
>>> +*/
>>> +   return ((res->desc != IORES_DESC_NONE) &&
>>> +   (res->desc != IORES_DESC_RESERVED));
>>>  }
>>
>> Added Tom since he added the check function.  Is it possible to only
>> check explict valid desc types instead of exclude IORES_DESC_NONE?
> 
> Sorry for the delay...
> 
> The original intent of the check was to map most memory as encrypted under
> SEV if it was marked with a specific descriptor, since it was likely to
> not be MMIO. I tried converting most things that mapped memory to memremap
> vs ioremap, but ACPI was one area that I left alone and this check catches
> the mapping of the ACPI tables. I 

Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-12-09 Thread lijiang
在 2018年12月07日 04:11, Borislav Petkov 写道:
> On Fri, Nov 30, 2018 at 03:04:44PM +0800, lijiang wrote:
>> I have noticed the changes on x86, but for IA64, i'm not sure whether it 
>> should do the same
>> thing, so keep it as before.
>>
>> If IA64 people would like to give any comment, that will be appreciated.
> 
> I think you should not touch ia64 and not make Tony unnecessarily power
> up the old rust.
> 
> :-)
> 
Ok, it's good to me. And i will get rid of these changes for ia64 in patch v9.

Thanks.
Lianbo


Re: [PATCH 2/2 v2] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

2018-12-09 Thread lijiang
在 2018年12月05日 18:24, Dave Young 写道:
> On 12/02/18 at 11:08am, Lianbo Jiang wrote:
>> For AMD machine with SME feature, makedumpfile tools need to know
>> whether the crash kernel was encrypted or not. If SME is enabled
>> in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
>> contains the memory encryption mask, so need to remove the sme mask
>> to obtain the true physical address.
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/kernel/machine_kexec_64.c | 14 ++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c 
>> b/arch/x86/kernel/machine_kexec_64.c
>> index 4c8acdfdc5a7..1860fe24117d 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -352,10 +352,24 @@ void machine_kexec(struct kimage *image)
>>  
>>  void arch_crash_save_vmcoreinfo(void)
>>  {
>> +u64 sme_mask = sme_me_mask;
>> +
>>  VMCOREINFO_NUMBER(phys_base);
>>  VMCOREINFO_SYMBOL(init_top_pgt);
>>  vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
>>  pgtable_l5_enabled());
>> +/*
>> + * Currently, the local variable 'sme_mask' stores the value of
>> + * sme_me_mask(bit 47), and also write the value of sme_mask to
>> + * the vmcoreinfo.
>> + * If need, the bit(sme_mask) might be redefined in the future,
>> + * but the 'bit63' will be reserved.
>> + * For example:
>> + * [ misc  ][ enc bit  ][ other misc SME info   ]
>> + * ____1000______..._
>> + * 63   59   55   51   47   43   39   35   31   27   ... 3
>> + */
>> +VMCOREINFO_NUMBER(sme_mask);
> 
> #define VMCOREINFO_NUMBER(name) \
> vmcoreinfo_append_str("NUMBER(%s)=%ld\n", #name, (long)name)
> 
> VMCOREINFO_NUMBER is defined as above, so it looks questionable to add
> more users of that for different data types although it may work in real
> world.
> 

Thank you, Dave.
For the sme_mask, the bit47 is set '1', and the VMCOREINFO_NUMBER is a signed
64 bit number(x86_64), it is big enough to the sme_mask.


> A new macro like below may be better, may need to choose a better name
> though:
> _VMCOREINFO_NUMBER(name, format, type)
> so you can pass the format specifier and data types explictly
> 

That should be a good suggestion. But for now, maybe it is not time for 
improving it.
Because it is still big enough.

Thanks.
Lianbo

> 
>>  
>>  #ifdef CONFIG_NUMA
>>  VMCOREINFO_SYMBOL(node_data);
>> -- 
>> 2.17.1
>>
> 
> Thanks
> Dave
> 


Re: [PATCH 1/2 v2] kdump: add the vmcoreinfo documentation

2018-12-04 Thread lijiang
在 2018年12月03日 23:08, Borislav Petkov 写道:
> Add some more Ccs.
> 

Thanks a lot.

There are more people to review and improve this document together, that would
be fine.

> On Sun, Dec 02, 2018 at 11:08:38AM +0800, Lianbo Jiang wrote:
>> This document lists some variables that export to vmcoreinfo, and briefly
>> describles what these variables indicate. It should be instructive for
>> many people who do not know the vmcoreinfo, and it also normalizes the
>> exported variable as a standard ABI between kernel and use-space.
> 
> Yeah, I'm not sure about it being an ABI. Apparently, it is considered
> too tightly coupled to the kernel for it to be an ABI.
> 
> Regardless, thanks for doing that.
> 

It's my pleasure to do that.

>> Suggested-by: Borislav Petkov 
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  Documentation/kdump/vmcoreinfo.txt | 400 +
>>  1 file changed, 400 insertions(+)
>>  create mode 100644 Documentation/kdump/vmcoreinfo.txt
>>
>> diff --git a/Documentation/kdump/vmcoreinfo.txt 
>> b/Documentation/kdump/vmcoreinfo.txt
> 
> Aren't we adding new docs in rst format only or what is the logic there?
> 
> Jon?
> 
>> new file mode 100644
>> index ..c6759be14af7
>> --- /dev/null
>> +++ b/Documentation/kdump/vmcoreinfo.txt
>> @@ -0,0 +1,400 @@
>> +
>> +Documentation for Vmcoreinfo
>> +
>> +
>> +===
>> +What is the vmcoreinfo?
>> +===
>> +The vmcoreinfo contains the first kernel's various information, for
> 
> The first sentence here should be explaining what VMCOREINFO is: it is
> an ELF PT_NOTE section. So that people can go, oh ok, it is a special
> ELF section, when reading.
> 
> Then, MAKEDUMPFILE(8) spells VMCOREINFO in all caps and I think we
> should do that too here, for ease of recognition.
> 

This is good advice.

> Btw, do we have a makedumpfile switch or a tool/script which dumps
> VMCOREINFO contents in human-readable form?
> 

Generating VMCOREINFO is easy in the first kernel, for example:
# makedumpfile -g VMCOREINFO -x vmlinux

# file VMCOREINFO
VMCOREINFO: ASCII text

> Maybe something nicer than:
> 
> $ hexdump -C /proc/kcore
> 
>> +example, structure size, page size, symbol values and field offset,
>> +etc. These data are encapsulated into an elf format, and these data
>> +will also help user-space tools(e.g. makedumpfile, crash) analyze the
>> +first kernel's memory usage.
>> +
>> +
>> +Common variables
>> +
>> +
>> +init_uts_ns.name.release
>> +
>> +The number of OS release.
>> +
>> +PAGE_SIZE
>> +=
>> +The size of a page. It is usually 4k bytes.
>> +
>> +init_uts_ns
>> +===
>> +This is the UTS namespace, which is used to isolate two specific elements
>> +of the system that relate to the uname system call. The UTS namespace is
>> +named after the data structure used to store information returned by the
>> +uname system call.
> 
> Those non-obvious exports should also have a short explanation why
> they're part of VMCOREINFO.
> 
>> +
>> +node_online_map
>> +===
>> +It is a macro definition, actually it is an arrary node_states[N_ONLINE],
>> +and it represents the set of online node in a system, one bit position
>> +per node number.
> 
> Ditto.
> 
> So yeah, people can find out what those things are but I think it is
> more important to state here *why* they're part of VMCOREINFO and how
> they're used and why they're exported.
> 

This is a good question.

For these two *why*, it should be easy to understand. Because user-space tools
need to know basic information, such as the symbol values, field offset, 
structure
size, etc. Otherwise, these tools won't know how to analyze the memory of the 
crash
kernel.

For the second question 'how they are used', we can get the answer from 
user-space
tools, such as makedumpfile, crash tools. Therefore, it may not need to explain 
any
more in kernel document. On the other hand, if we must put these contents into 
kernel
document, i have to say, that would be a hard task.


> Who knows, some might turn out to be not needed anymore. :)
> 
>> +
>> +swapper_pg_dir
>> +=
>> +It is always an array, it gerenally stands for the pgd for the kernel.
>> +When mmu is enabled in config file, the 'swapper_pg_dir' is valid.
>> +
>> +_stext
>> +==
>> +It is an assemble directive that defines the beginning of the text section.
> 
> That's an assembly symbol.
> 
>> +In gerenal, the '_stext' indicates the kernel start address.
>> +
>> +vmap_area_list
>> +==
>> +It stores the virtual area list, makedumpfile can get the vmalloc start
>> +value according to this variable.
> 
> "... from this variable."
> 
>> +
>> +mem_map
>> +===
>> +Physical addresses are translated to struct pages by treating them as an
>> +index into the mem_map 

Re: [PATCH 1/2 v2] kdump: add the vmcoreinfo documentation

2018-12-04 Thread lijiang
在 2018年12月03日 23:08, Borislav Petkov 写道:
> Add some more Ccs.
> 

Thanks a lot.

There are more people to review and improve this document together, that would
be fine.

> On Sun, Dec 02, 2018 at 11:08:38AM +0800, Lianbo Jiang wrote:
>> This document lists some variables that export to vmcoreinfo, and briefly
>> describles what these variables indicate. It should be instructive for
>> many people who do not know the vmcoreinfo, and it also normalizes the
>> exported variable as a standard ABI between kernel and use-space.
> 
> Yeah, I'm not sure about it being an ABI. Apparently, it is considered
> too tightly coupled to the kernel for it to be an ABI.
> 
> Regardless, thanks for doing that.
> 

It's my pleasure to do that.

>> Suggested-by: Borislav Petkov 
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  Documentation/kdump/vmcoreinfo.txt | 400 +
>>  1 file changed, 400 insertions(+)
>>  create mode 100644 Documentation/kdump/vmcoreinfo.txt
>>
>> diff --git a/Documentation/kdump/vmcoreinfo.txt 
>> b/Documentation/kdump/vmcoreinfo.txt
> 
> Aren't we adding new docs in rst format only or what is the logic there?
> 
> Jon?
> 
>> new file mode 100644
>> index ..c6759be14af7
>> --- /dev/null
>> +++ b/Documentation/kdump/vmcoreinfo.txt
>> @@ -0,0 +1,400 @@
>> +
>> +Documentation for Vmcoreinfo
>> +
>> +
>> +===
>> +What is the vmcoreinfo?
>> +===
>> +The vmcoreinfo contains the first kernel's various information, for
> 
> The first sentence here should be explaining what VMCOREINFO is: it is
> an ELF PT_NOTE section. So that people can go, oh ok, it is a special
> ELF section, when reading.
> 
> Then, MAKEDUMPFILE(8) spells VMCOREINFO in all caps and I think we
> should do that too here, for ease of recognition.
> 

This is good advice.

> Btw, do we have a makedumpfile switch or a tool/script which dumps
> VMCOREINFO contents in human-readable form?
> 

Generating VMCOREINFO is easy in the first kernel, for example:
# makedumpfile -g VMCOREINFO -x vmlinux

# file VMCOREINFO
VMCOREINFO: ASCII text

> Maybe something nicer than:
> 
> $ hexdump -C /proc/kcore
> 
>> +example, structure size, page size, symbol values and field offset,
>> +etc. These data are encapsulated into an elf format, and these data
>> +will also help user-space tools(e.g. makedumpfile, crash) analyze the
>> +first kernel's memory usage.
>> +
>> +
>> +Common variables
>> +
>> +
>> +init_uts_ns.name.release
>> +
>> +The number of OS release.
>> +
>> +PAGE_SIZE
>> +=
>> +The size of a page. It is usually 4k bytes.
>> +
>> +init_uts_ns
>> +===
>> +This is the UTS namespace, which is used to isolate two specific elements
>> +of the system that relate to the uname system call. The UTS namespace is
>> +named after the data structure used to store information returned by the
>> +uname system call.
> 
> Those non-obvious exports should also have a short explanation why
> they're part of VMCOREINFO.
> 
>> +
>> +node_online_map
>> +===
>> +It is a macro definition, actually it is an arrary node_states[N_ONLINE],
>> +and it represents the set of online node in a system, one bit position
>> +per node number.
> 
> Ditto.
> 
> So yeah, people can find out what those things are but I think it is
> more important to state here *why* they're part of VMCOREINFO and how
> they're used and why they're exported.
> 

This is a good question.

For these two *why*, it should be easy to understand. Because user-space tools
need to know basic information, such as the symbol values, field offset, 
structure
size, etc. Otherwise, these tools won't know how to analyze the memory of the 
crash
kernel.

For the second question 'how they are used', we can get the answer from 
user-space
tools, such as makedumpfile, crash tools. Therefore, it may not need to explain 
any
more in kernel document. On the other hand, if we must put these contents into 
kernel
document, i have to say, that would be a hard task.


> Who knows, some might turn out to be not needed anymore. :)
> 
>> +
>> +swapper_pg_dir
>> +=
>> +It is always an array, it gerenally stands for the pgd for the kernel.
>> +When mmu is enabled in config file, the 'swapper_pg_dir' is valid.
>> +
>> +_stext
>> +==
>> +It is an assemble directive that defines the beginning of the text section.
> 
> That's an assembly symbol.
> 
>> +In gerenal, the '_stext' indicates the kernel start address.
>> +
>> +vmap_area_list
>> +==
>> +It stores the virtual area list, makedumpfile can get the vmalloc start
>> +value according to this variable.
> 
> "... from this variable."
> 
>> +
>> +mem_map
>> +===
>> +Physical addresses are translated to struct pages by treating them as an
>> +index into the mem_map 

Re: [tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-08 Thread lijiang
在 2018年10月08日 21:43, Borislav Petkov 写道:
> On Mon, Oct 08, 2018 at 10:59:09AM +0200, Borislav Petkov wrote:
>> On Mon, Oct 08, 2018 at 04:47:34PM +0800, lijiang wrote:
>>> It looks like a good way to avoid the 'ifdefined', and it's also good 
>>> enough for i386.
>>>
>>> But for other architectures, such as POWERPC/ARM..., we will also have to 
>>> add the same 
>>> function for every architecture. Otherwise, i guess that they also have a 
>>> same compile
>>> error on other architectures.
>>
>> Yap, just realized that and looking at the rest of fs/proc/vmcore.c -
>> such functions are defined with the __weak attribute. Lemme see if that
>> works better.
> 
> Seems so. I'll hammer on it more today:
> 
Great! Thank you, Borislav.

Regards,
Lianbo
> ---
>  fs/proc/vmcore.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 42c32d06f7da..91ae16fbd7d5 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -187,6 +187,16 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct 
> *vma,
>   return remap_pfn_range(vma, from, pfn, size, prot);
>  }
>  
> +/*
> + * Architectures which support memory encryption override this.
> + */
> +ssize_t __weak
> +copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize,
> +unsigned long offset, int userbuf)
> +{
> + return copy_oldmem_page(pfn, buf, csize, offset, userbuf);
> +}
> +
>  /*
>   * Copy to either kernel or user space
>   */
> 


Re: [tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-08 Thread lijiang
在 2018年10月08日 21:43, Borislav Petkov 写道:
> On Mon, Oct 08, 2018 at 10:59:09AM +0200, Borislav Petkov wrote:
>> On Mon, Oct 08, 2018 at 04:47:34PM +0800, lijiang wrote:
>>> It looks like a good way to avoid the 'ifdefined', and it's also good 
>>> enough for i386.
>>>
>>> But for other architectures, such as POWERPC/ARM..., we will also have to 
>>> add the same 
>>> function for every architecture. Otherwise, i guess that they also have a 
>>> same compile
>>> error on other architectures.
>>
>> Yap, just realized that and looking at the rest of fs/proc/vmcore.c -
>> such functions are defined with the __weak attribute. Lemme see if that
>> works better.
> 
> Seems so. I'll hammer on it more today:
> 
Great! Thank you, Borislav.

Regards,
Lianbo
> ---
>  fs/proc/vmcore.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 42c32d06f7da..91ae16fbd7d5 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -187,6 +187,16 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct 
> *vma,
>   return remap_pfn_range(vma, from, pfn, size, prot);
>  }
>  
> +/*
> + * Architectures which support memory encryption override this.
> + */
> +ssize_t __weak
> +copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize,
> +unsigned long offset, int userbuf)
> +{
> + return copy_oldmem_page(pfn, buf, csize, offset, userbuf);
> +}
> +
>  /*
>   * Copy to either kernel or user space
>   */
> 


Re: [tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-08 Thread lijiang
在 2018年10月08日 16:00, Borislav Petkov 写道:
> On Mon, Oct 08, 2018 at 03:11:56PM +0800, lijiang wrote:
>> I used this ".config" to compile kernel in the attachment, and got a compile 
>> error.
>> Would you like to have a try?
>>
>> [root@hp-dl385g10-03 linux]# make ARCH=i386 -j32
>>   ..
>>   LD  vmlinux.o
>>   MODPOST vmlinux.o
>> fs/proc/vmcore.o:In function ‘read_from_oldmem’:
>> /home/linux/fs/proc/vmcore.c:127:undefined reference to 
>> ‘copy_oldmem_page_encrypted’
>> make: *** [vmlinux] error 1
> 
> Thanks, that triggered here. Ok, I guess something like this, to avoid
> the ugly ifdeffery:
> 
> ---
> diff --git a/arch/x86/kernel/crash_dump_32.c b/arch/x86/kernel/crash_dump_32.c
> index 33ee47670b99..8696800f2eea 100644
> --- a/arch/x86/kernel/crash_dump_32.c
> +++ b/arch/x86/kernel/crash_dump_32.c
> @@ -80,6 +80,16 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>   return csize;
>  }
>  
> +/*
> + * 32-bit parrot version to avoid build errors.
> + */
> +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t 
> csize,
> +unsigned long offset, int userbuf)
> +{
> + WARN_ON_ONCE(1);
> + return copy_oldmem_page(pfn, buf, csize, offset, userbuf);
> +}
> +

It looks like a good way to avoid the 'ifdefined', and it's also good enough 
for i386.

But for other architectures, such as POWERPC/ARM..., we will also have to add 
the same 
function for every architecture. Otherwise, i guess that they also have a same 
compile
error on other architectures.

Sometimes, it's hard to make a choice.

Regards,
Lianbo
>  static int __init kdump_buf_page_init(void)
>  {
>   int ret = 0;
> 
> 
> 


Re: [tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-08 Thread lijiang
在 2018年10月08日 16:00, Borislav Petkov 写道:
> On Mon, Oct 08, 2018 at 03:11:56PM +0800, lijiang wrote:
>> I used this ".config" to compile kernel in the attachment, and got a compile 
>> error.
>> Would you like to have a try?
>>
>> [root@hp-dl385g10-03 linux]# make ARCH=i386 -j32
>>   ..
>>   LD  vmlinux.o
>>   MODPOST vmlinux.o
>> fs/proc/vmcore.o:In function ‘read_from_oldmem’:
>> /home/linux/fs/proc/vmcore.c:127:undefined reference to 
>> ‘copy_oldmem_page_encrypted’
>> make: *** [vmlinux] error 1
> 
> Thanks, that triggered here. Ok, I guess something like this, to avoid
> the ugly ifdeffery:
> 
> ---
> diff --git a/arch/x86/kernel/crash_dump_32.c b/arch/x86/kernel/crash_dump_32.c
> index 33ee47670b99..8696800f2eea 100644
> --- a/arch/x86/kernel/crash_dump_32.c
> +++ b/arch/x86/kernel/crash_dump_32.c
> @@ -80,6 +80,16 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>   return csize;
>  }
>  
> +/*
> + * 32-bit parrot version to avoid build errors.
> + */
> +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t 
> csize,
> +unsigned long offset, int userbuf)
> +{
> + WARN_ON_ONCE(1);
> + return copy_oldmem_page(pfn, buf, csize, offset, userbuf);
> +}
> +

It looks like a good way to avoid the 'ifdefined', and it's also good enough 
for i386.

But for other architectures, such as POWERPC/ARM..., we will also have to add 
the same 
function for every architecture. Otherwise, i guess that they also have a same 
compile
error on other architectures.

Sometimes, it's hard to make a choice.

Regards,
Lianbo
>  static int __init kdump_buf_page_init(void)
>  {
>   int ret = 0;
> 
> 
> 


Re: [tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-08 Thread lijiang
在 2018年10月08日 13:37, Borislav Petkov 写道:
> On Mon, Oct 08, 2018 at 11:30:56AM +0800, lijiang wrote:
>> Yes. As previously mentioned, the correct patch is this one:
> 
> No, that chunk is not needed and I removed it. But I'd leave it as
> an exercise to you to figure out why... or to prove me wrong with a
> .config.
> 
> :-)
> 

I used this ".config" to compile kernel in the attachment, and got a compile 
error.
Would you like to have a try?

[root@hp-dl385g10-03 linux]# make ARCH=i386 -j32
  ..
  LD  vmlinux.o
  MODPOST vmlinux.o
fs/proc/vmcore.o:In function ‘read_from_oldmem’:
/home/linux/fs/proc/vmcore.c:127:undefined reference to 
‘copy_oldmem_page_encrypted’
make: *** [vmlinux] error 1


Regards,
Lianbo


i386_config.gz
Description: application/gzip


Re: [tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-08 Thread lijiang
在 2018年10月08日 13:37, Borislav Petkov 写道:
> On Mon, Oct 08, 2018 at 11:30:56AM +0800, lijiang wrote:
>> Yes. As previously mentioned, the correct patch is this one:
> 
> No, that chunk is not needed and I removed it. But I'd leave it as
> an exercise to you to figure out why... or to prove me wrong with a
> .config.
> 
> :-)
> 

I used this ".config" to compile kernel in the attachment, and got a compile 
error.
Would you like to have a try?

[root@hp-dl385g10-03 linux]# make ARCH=i386 -j32
  ..
  LD  vmlinux.o
  MODPOST vmlinux.o
fs/proc/vmcore.o:In function ‘read_from_oldmem’:
/home/linux/fs/proc/vmcore.c:127:undefined reference to 
‘copy_oldmem_page_encrypted’
make: *** [vmlinux] error 1


Regards,
Lianbo


i386_config.gz
Description: application/gzip


Re: [tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-07 Thread lijiang
在 2018年10月07日 16:47, Borislav Petkov 写道:
> On Sun, Oct 07, 2018 at 01:55:33PM +0800, lijiang wrote:
>> Here, it may be have a compile error.
> 
> Are you sure? The configs I tried worked fine but I'm open to being
> shown configs which fail the build.
> 

Yes. As previously mentioned, the correct patch is this one:

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 3e4ba9d753c8..84d8ddcb818e 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -26,6 +26,19 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
 
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
+#if defined(CONFIG_AMD_MEM_ENCRYPT) || defined(CONFIG_X86_64)
+extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf);
+#else
+static inline
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize,
+  unsigned long offset, int userbuf)
+{
+   return 0;
+}
+#endif
+

I used the patch above to test six compile cases. All of them passed, there
was no compile error.

I'm not sure whether the kernel options or compile environment are different.
Would you like to share your kernel options(.config)? I will use your kernel
options to compile, and check whether i might also reproduce your compile 
error. 

1. x86_64 (CONFIG_X86_64=y)
   a. 
  CONFIG_AMD_MEM_ENCRYPT=y
  CONFIG_CRASH_DUMP=y

   b.
  # CONFIG_AMD_MEM_ENCRYPT is not set
  # CONFIG_CRASH_DUMP is not set

   c. 
  # CONFIG_AMD_MEM_ENCRYPT is not set
  CONFIG_CRASH_DUMP=y

   d. 
  CONFIG_AMD_MEM_ENCRYPT=y
  # CONFIG_CRASH_DUMP is not set

Compile command:
#make clean
#make ARCH=x86_64 -j32

2. i386 (CONFIG_X86_32=y)
   a. 
   CONFIG_CRASH_DUMP=y

   b.
   # CONFIG_CRASH_DUMP is not set

Compile command:
#make clean
#make ARCH=i386 -j32

Thanks.
Lianbo


Re: [tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-07 Thread lijiang
在 2018年10月07日 16:47, Borislav Petkov 写道:
> On Sun, Oct 07, 2018 at 01:55:33PM +0800, lijiang wrote:
>> Here, it may be have a compile error.
> 
> Are you sure? The configs I tried worked fine but I'm open to being
> shown configs which fail the build.
> 

Yes. As previously mentioned, the correct patch is this one:

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 3e4ba9d753c8..84d8ddcb818e 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -26,6 +26,19 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
 
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
+#if defined(CONFIG_AMD_MEM_ENCRYPT) || defined(CONFIG_X86_64)
+extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf);
+#else
+static inline
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize,
+  unsigned long offset, int userbuf)
+{
+   return 0;
+}
+#endif
+

I used the patch above to test six compile cases. All of them passed, there
was no compile error.

I'm not sure whether the kernel options or compile environment are different.
Would you like to share your kernel options(.config)? I will use your kernel
options to compile, and check whether i might also reproduce your compile 
error. 

1. x86_64 (CONFIG_X86_64=y)
   a. 
  CONFIG_AMD_MEM_ENCRYPT=y
  CONFIG_CRASH_DUMP=y

   b.
  # CONFIG_AMD_MEM_ENCRYPT is not set
  # CONFIG_CRASH_DUMP is not set

   c. 
  # CONFIG_AMD_MEM_ENCRYPT is not set
  CONFIG_CRASH_DUMP=y

   d. 
  CONFIG_AMD_MEM_ENCRYPT=y
  # CONFIG_CRASH_DUMP is not set

Compile command:
#make clean
#make ARCH=x86_64 -j32

2. i386 (CONFIG_X86_32=y)
   a. 
   CONFIG_CRASH_DUMP=y

   b.
   # CONFIG_CRASH_DUMP is not set

Compile command:
#make clean
#make ARCH=i386 -j32

Thanks.
Lianbo


Re: [tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-06 Thread lijiang
copy_oldmem_page_encrypted - same as copy_oldmem_page() above but ioremap 
> the
> + * memory with the encryption mask set to accomodate kdump on SME-enabled
> + * machines.
> + */
> +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t 
> csize,
> +unsigned long offset, int userbuf)
> +{
> + return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, true);
> +}
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index cbde728f8ac6..42c32d06f7da 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -24,6 +24,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include "internal.h"
>  
> @@ -98,7 +100,8 @@ static int pfn_is_ram(unsigned long pfn)
>  
>  /* Reads a page from the oldmem device from given offset. */
>  static ssize_t read_from_oldmem(char *buf, size_t count,
> - u64 *ppos, int userbuf)
> + u64 *ppos, int userbuf,
> + bool encrypted)
>  {
>   unsigned long pfn, offset;
>   size_t nr_bytes;
> @@ -120,8 +123,15 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
>   if (pfn_is_ram(pfn) == 0)
>   memset(buf, 0, nr_bytes);
>   else {
> - tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> - offset, userbuf);
> + if (encrypted)
> + tmp = copy_oldmem_page_encrypted(pfn, buf,
> +  nr_bytes,
> +  offset,
> +  userbuf);
> + else
> + tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> +offset, userbuf);
> +
>   if (tmp < 0)
>   return tmp;
>   }
> @@ -155,7 +165,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
>   */
>  ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
>  {
> - return read_from_oldmem(buf, count, ppos, 0);
> + return read_from_oldmem(buf, count, ppos, 0, false);
>  }
>  
>  /*
> @@ -163,7 +173,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, 
> u64 *ppos)
>   */
>  ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
>  {
> - return read_from_oldmem(buf, count, ppos, 0);
> + return read_from_oldmem(buf, count, ppos, 0, sme_active());
>  }
>  
>  /*
> @@ -173,6 +183,7 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct 
> *vma,
> unsigned long from, unsigned long pfn,
> unsigned long size, pgprot_t prot)
>  {
> + prot = pgprot_encrypted(prot);
>   return remap_pfn_range(vma, from, pfn, size, prot);
>  }
>  
> @@ -351,7 +362,8 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, 
> loff_t *fpos,
>   m->offset + m->size - *fpos,
>   buflen);
>   start = m->paddr + *fpos - m->offset;
> - tmp = read_from_oldmem(buffer, tsz, , userbuf);
> + tmp = read_from_oldmem(buffer, tsz, ,
> +userbuf, sme_active());
>   if (tmp < 0)
>   return tmp;
>   buflen -= tsz;
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index 3e4ba9d753c8..f774c5eb9e3c 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -26,6 +26,10 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct 
> *vma,
>  
>  extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
>   unsigned long, int);
> +extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
> +   size_t csize, unsigned long offset,
> +   int userbuf);
> +

Here, it may be have a compile error.
Links: https://lore.kernel.org/patchwork/patch/993337/
kbuild test robot Sept. 29, 2018, 6:25 p.m. UTC | #1

The correct patch is this one, you might refer to "Re: [PATCH v9 4/4] 
kdump/vmcore:support
encrypted old memory with SME enabled" or this links.
Links: https://lore.kernel.org/patchwork/patch/993538/#1177439
lijiang Sept. 30, 2018, 8:37 a.m. UTC | #2 

d

Re: [tip:x86/mm] kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled

2018-10-06 Thread lijiang
copy_oldmem_page_encrypted - same as copy_oldmem_page() above but ioremap 
> the
> + * memory with the encryption mask set to accomodate kdump on SME-enabled
> + * machines.
> + */
> +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t 
> csize,
> +unsigned long offset, int userbuf)
> +{
> + return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, true);
> +}
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index cbde728f8ac6..42c32d06f7da 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -24,6 +24,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include "internal.h"
>  
> @@ -98,7 +100,8 @@ static int pfn_is_ram(unsigned long pfn)
>  
>  /* Reads a page from the oldmem device from given offset. */
>  static ssize_t read_from_oldmem(char *buf, size_t count,
> - u64 *ppos, int userbuf)
> + u64 *ppos, int userbuf,
> + bool encrypted)
>  {
>   unsigned long pfn, offset;
>   size_t nr_bytes;
> @@ -120,8 +123,15 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
>   if (pfn_is_ram(pfn) == 0)
>   memset(buf, 0, nr_bytes);
>   else {
> - tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> - offset, userbuf);
> + if (encrypted)
> + tmp = copy_oldmem_page_encrypted(pfn, buf,
> +  nr_bytes,
> +  offset,
> +  userbuf);
> + else
> + tmp = copy_oldmem_page(pfn, buf, nr_bytes,
> +offset, userbuf);
> +
>   if (tmp < 0)
>   return tmp;
>   }
> @@ -155,7 +165,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
>   */
>  ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
>  {
> - return read_from_oldmem(buf, count, ppos, 0);
> + return read_from_oldmem(buf, count, ppos, 0, false);
>  }
>  
>  /*
> @@ -163,7 +173,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, 
> u64 *ppos)
>   */
>  ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
>  {
> - return read_from_oldmem(buf, count, ppos, 0);
> + return read_from_oldmem(buf, count, ppos, 0, sme_active());
>  }
>  
>  /*
> @@ -173,6 +183,7 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct 
> *vma,
> unsigned long from, unsigned long pfn,
> unsigned long size, pgprot_t prot)
>  {
> + prot = pgprot_encrypted(prot);
>   return remap_pfn_range(vma, from, pfn, size, prot);
>  }
>  
> @@ -351,7 +362,8 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, 
> loff_t *fpos,
>   m->offset + m->size - *fpos,
>   buflen);
>   start = m->paddr + *fpos - m->offset;
> - tmp = read_from_oldmem(buffer, tsz, , userbuf);
> + tmp = read_from_oldmem(buffer, tsz, ,
> +userbuf, sme_active());
>   if (tmp < 0)
>   return tmp;
>   buflen -= tsz;
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index 3e4ba9d753c8..f774c5eb9e3c 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -26,6 +26,10 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct 
> *vma,
>  
>  extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
>   unsigned long, int);
> +extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
> +   size_t csize, unsigned long offset,
> +   int userbuf);
> +

Here, it may be have a compile error.
Links: https://lore.kernel.org/patchwork/patch/993337/
kbuild test robot Sept. 29, 2018, 6:25 p.m. UTC | #1

The correct patch is this one, you might refer to "Re: [PATCH v9 4/4] 
kdump/vmcore:support
encrypted old memory with SME enabled" or this links.
Links: https://lore.kernel.org/patchwork/patch/993538/#1177439
lijiang Sept. 30, 2018, 8:37 a.m. UTC | #2 

d

Re: [PATCH v7 RESEND 4/4] kdump/vmcore: support encrypted old memory with SME enabled

2018-09-29 Thread lijiang
在 2018年09月29日 16:30, Borislav Petkov 写道:
> On Sat, Sep 29, 2018 at 02:24:52PM +0800, lijiang wrote:
>> At first, i added an input parameter for read_from_oldmem() because of 
>> encryption(SME). But
>> for avoiding to also add the same parameter for copy_oldmem_page(), so i 
>> added a new function
>> copy_oldmem_page_encrypted(). Maybe you had noticed that these functions 
>> were very similar.
> 
> If you have two very similar functions, you add a *static* workhorse function:
> 
> static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize, 
> unsigned long offset,
> int userbuf, bool encrypted)
> 
> and you define two wrappers:
> 
> copy_oldmem_page()
> copy_oldmem_page_encrypted()
> 
> which both call __copy_oldmem_page() with the appropriate parameters.
> 

Great! Previously i was afraid that the maintainer might disagree with
rewriting the function copy_oldmem_page().

That's really great. I will modify this patch and post the series again.

Thanks.
Lianbo
> But you do *not* do a separate compilation unit just because. None of
> the reasons you've mentioned warrant having a separate compilation
> unit while you already have *the* perfect place to put everything -
> arch/x86/kernel/crash_dump_64.c
> 
>> So sorry, here "oldmem encrypted" means the "old memory is encrypted".
> 
> I know what it means - I'm trying to explain to you to write it out
> in plain english and not use some strange constructs like "oldmem
> encrypted".
> 
> A reader would wonder: why is this thing semi-abbreviated and in
> quotation marks? Does that mean anything special?
> 
> Our comments should not be write-only. So after you've written it, try
> to read it as someone who sees the code for the first time and think
> hard whether she/he will understand it.
> 
> Do you catch my drift now?
> 
Yes, got it. Thanks for your valuable time and patience.

Regards,
Lianbo


Re: [PATCH v7 RESEND 4/4] kdump/vmcore: support encrypted old memory with SME enabled

2018-09-29 Thread lijiang
在 2018年09月29日 16:30, Borislav Petkov 写道:
> On Sat, Sep 29, 2018 at 02:24:52PM +0800, lijiang wrote:
>> At first, i added an input parameter for read_from_oldmem() because of 
>> encryption(SME). But
>> for avoiding to also add the same parameter for copy_oldmem_page(), so i 
>> added a new function
>> copy_oldmem_page_encrypted(). Maybe you had noticed that these functions 
>> were very similar.
> 
> If you have two very similar functions, you add a *static* workhorse function:
> 
> static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize, 
> unsigned long offset,
> int userbuf, bool encrypted)
> 
> and you define two wrappers:
> 
> copy_oldmem_page()
> copy_oldmem_page_encrypted()
> 
> which both call __copy_oldmem_page() with the appropriate parameters.
> 

Great! Previously i was afraid that the maintainer might disagree with
rewriting the function copy_oldmem_page().

That's really great. I will modify this patch and post the series again.

Thanks.
Lianbo
> But you do *not* do a separate compilation unit just because. None of
> the reasons you've mentioned warrant having a separate compilation
> unit while you already have *the* perfect place to put everything -
> arch/x86/kernel/crash_dump_64.c
> 
>> So sorry, here "oldmem encrypted" means the "old memory is encrypted".
> 
> I know what it means - I'm trying to explain to you to write it out
> in plain english and not use some strange constructs like "oldmem
> encrypted".
> 
> A reader would wonder: why is this thing semi-abbreviated and in
> quotation marks? Does that mean anything special?
> 
> Our comments should not be write-only. So after you've written it, try
> to read it as someone who sees the code for the first time and think
> hard whether she/he will understand it.
> 
> Do you catch my drift now?
> 
Yes, got it. Thanks for your valuable time and patience.

Regards,
Lianbo


Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread lijiang
在 2018年09月28日 00:10, Borislav Petkov 写道:
> On Thu, Sep 27, 2018 at 10:53:47PM +0800, lijiang wrote:
>> If no need to break this line, it will cause a warning of exceeding 80 
>> characters per line.
> 
> That's fine - we don't take the 80 cols rule blindly but apply common
> sense. In this particular case the lines can stick out because they're
> simply externs and are meant to be skimmed over and not really read. :)
> 
Ok, i see. Thanks.

>> Thank you for pointing out this issue, i forgot to remove this header file. 
>>
>> I will get rid of this header file and post this patch again.
> 
> No need - already fixed that up.
> 
Great, thank you so much for your help.

Regards,
Lianbo
> Thx.
> 


Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread lijiang
在 2018年09月28日 00:10, Borislav Petkov 写道:
> On Thu, Sep 27, 2018 at 10:53:47PM +0800, lijiang wrote:
>> If no need to break this line, it will cause a warning of exceeding 80 
>> characters per line.
> 
> That's fine - we don't take the 80 cols rule blindly but apply common
> sense. In this particular case the lines can stick out because they're
> simply externs and are meant to be skimmed over and not really read. :)
> 
Ok, i see. Thanks.

>> Thank you for pointing out this issue, i forgot to remove this header file. 
>>
>> I will get rid of this header file and post this patch again.
> 
> No need - already fixed that up.
> 
Great, thank you so much for your help.

Regards,
Lianbo
> Thx.
> 


Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread lijiang
在 2018年09月27日 21:17, Borislav Petkov 写道:
> On Thu, Sep 27, 2018 at 03:19:51PM +0800, Lianbo Jiang wrote:
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index 6de64840dd22..f8795f9581c7 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t 
>> offset, unsigned long size);
>>  #define ioremap_cache ioremap_cache
>>  extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long 
>> size, unsigned long prot_val);
>>  #define ioremap_prot ioremap_prot
>> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
>> +unsigned long size);
> 
> No need to break this line - see how the other externs don't.
> 
Ok, i will fix it. 

If no need to break this line, it will cause a warning of exceeding 80 
characters per line.

>> +#define ioremap_encrypted ioremap_encrypted
>>  
>>  /**
>>   * ioremap -   map bus memory into CPU space
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index c63a545ec199..e01e6c695add 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -24,6 +24,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
> 
> What is that include for and why is it not up there with the  includes instead here with the  ones?
> 
Thank you for pointing out this issue, i forgot to remove this header file. 

I will get rid of this header file and post this patch again.

Regards,
Lianbo


Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread lijiang
在 2018年09月27日 21:17, Borislav Petkov 写道:
> On Thu, Sep 27, 2018 at 03:19:51PM +0800, Lianbo Jiang wrote:
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index 6de64840dd22..f8795f9581c7 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t 
>> offset, unsigned long size);
>>  #define ioremap_cache ioremap_cache
>>  extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long 
>> size, unsigned long prot_val);
>>  #define ioremap_prot ioremap_prot
>> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
>> +unsigned long size);
> 
> No need to break this line - see how the other externs don't.
> 
Ok, i will fix it. 

If no need to break this line, it will cause a warning of exceeding 80 
characters per line.

>> +#define ioremap_encrypted ioremap_encrypted
>>  
>>  /**
>>   * ioremap -   map bus memory into CPU space
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index c63a545ec199..e01e6c695add 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -24,6 +24,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
> 
> What is that include for and why is it not up there with the  includes instead here with the  ones?
> 
Thank you for pointing out this issue, i forgot to remove this header file. 

I will get rid of this header file and post this patch again.

Regards,
Lianbo


Re: [PATCH 1/4 v8] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-26 Thread lijiang
When SME is enabled on AMD machine, the memory is encrypted in the first
kernel. In this case, SME also needs to be enabled in kdump kernel, and
we have to remap the old memory with the memory encryption mask.

Here we only talk about the case that SME is active in the first kernel,
and only care it's active too in kdump kernel. there are four cases we
need considered.

a. dump vmcore
   It is encrypted in the first kernel, and needs be read out in kdump
   kernel.

b. crash notes
   When dumping vmcore, the people usually need to read the useful
   information from notes, and the notes is also encrypted.

c. iommu device table
   It is allocated by kernel, need fill its pointer into mmio of amd iommu.
   It's encrypted in the first kernel, need read the old content to analyze
   and get useful information.

d. mmio of amd iommu
   Register reported by amd firmware, it's not RAM, we don't encrypt in
   both the first kernel and kdump kernel.

To achieve the goal, the solution is:
1. add a new bool parameter "encrypted" to __ioremap_caller()
   It is a low level function, and check the newly added parameter, if it's
   true and in kdump kernel, will remap the memory with sme mask.

2. add a new function ioremap_encrypted() to explicitly passed in a "true"
   value for "encrypted".
   For above a, b, c, we will call ioremap_encrypted();

3. adjust all existed ioremap wrapper functions, passed in "false" for
   encrypted to make them an before.

   ioremap_encrypted()\
   ioremap_cache() |
   ioremap_prot()  |
   ioremap_wt()|->__ioremap_caller()
   ioremap_wc()|
   ioremap_uc()|
   ioremap_nocache()  /

Signed-off-by: Lianbo Jiang 
---
Changes since v7:
1. Only modify patch log(suggested by Baoquan He)

 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 25 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de64840dd22..f8795f9581c7 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545ec199..e01e6c695add 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-

Re: [PATCH 1/4 v8] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-26 Thread lijiang
When SME is enabled on AMD machine, the memory is encrypted in the first
kernel. In this case, SME also needs to be enabled in kdump kernel, and
we have to remap the old memory with the memory encryption mask.

Here we only talk about the case that SME is active in the first kernel,
and only care it's active too in kdump kernel. there are four cases we
need considered.

a. dump vmcore
   It is encrypted in the first kernel, and needs be read out in kdump
   kernel.

b. crash notes
   When dumping vmcore, the people usually need to read the useful
   information from notes, and the notes is also encrypted.

c. iommu device table
   It is allocated by kernel, need fill its pointer into mmio of amd iommu.
   It's encrypted in the first kernel, need read the old content to analyze
   and get useful information.

d. mmio of amd iommu
   Register reported by amd firmware, it's not RAM, we don't encrypt in
   both the first kernel and kdump kernel.

To achieve the goal, the solution is:
1. add a new bool parameter "encrypted" to __ioremap_caller()
   It is a low level function, and check the newly added parameter, if it's
   true and in kdump kernel, will remap the memory with sme mask.

2. add a new function ioremap_encrypted() to explicitly passed in a "true"
   value for "encrypted".
   For above a, b, c, we will call ioremap_encrypted();

3. adjust all existed ioremap wrapper functions, passed in "false" for
   encrypted to make them an before.

   ioremap_encrypted()\
   ioremap_cache() |
   ioremap_prot()  |
   ioremap_wt()|->__ioremap_caller()
   ioremap_wc()|
   ioremap_uc()|
   ioremap_nocache()  /

Signed-off-by: Lianbo Jiang 
---
Changes since v7:
1. Only modify patch log(suggested by Baoquan He)

 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 25 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de64840dd22..f8795f9581c7 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545ec199..e01e6c695add 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-

Re: [PATCH 1/4 v7] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-26 Thread lijiang
Also cc maintainer and other reviewer. Thanks.

在 2018年09月26日 14:18, lijiang 写道:
> 在 2018年09月26日 10:21, Baoquan He 写道:
>> Hi Lianbo,
>>
>> On 09/07/18 at 04:18pm, Lianbo Jiang wrote:
>>> When SME is enabled on AMD machine, the memory is encrypted in the first
>>> kernel. In this case, SME also needs to be enabled in kdump kernel, and
>>> we have to remap the old memory with the memory encryption mask.
>>
>> This patch series looks good to me. One thing is in your v5 post, Boris
>> reviewed and complained about the git log, we worked together to make an
>> document to explain, wondering why you don't rearrange it into log of
>> this patch. Other than this, all looks fine.
>>
>> http://lkml.kernel.org/r/53536964-2b57-4630-de91-3d4da2b64...@redhat.com
>>
> Thank you, Baoquan.
> 
> Previously i had considered whether i should put these explaining into patch 
> log,
> because these contents are a little more, i might just put the description of
> Solution A into this patch log and post this patch again.
> 
> Lianbo
>>
>>>
>>> Signed-off-by: Lianbo Jiang 
>>> ---
>>>  arch/x86/include/asm/io.h |  3 +++
>>>  arch/x86/mm/ioremap.c | 25 +
>>>  2 files changed, 20 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>>> index 6de64840dd22..f8795f9581c7 100644
>>> --- a/arch/x86/include/asm/io.h
>>> +++ b/arch/x86/include/asm/io.h
>>> @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t 
>>> offset, unsigned long size);
>>>  #define ioremap_cache ioremap_cache
>>>  extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long 
>>> size, unsigned long prot_val);
>>>  #define ioremap_prot ioremap_prot
>>> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
>>> +   unsigned long size);
>>> +#define ioremap_encrypted ioremap_encrypted
>>>  
>>>  /**
>>>   * ioremap -   map bus memory into CPU space
>>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>>> index c63a545ec199..e01e6c695add 100644
>>> --- a/arch/x86/mm/ioremap.c
>>> +++ b/arch/x86/mm/ioremap.c
>>> @@ -24,6 +24,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  
>>>  #include "physaddr.h"
>>>  
>>> @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
>>> unsigned long size,
>>>   * caller shouldn't need to know that small detail.
>>>   */
>>>  static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>>> -   unsigned long size, enum page_cache_mode pcm, void *caller)
>>> +   unsigned long size, enum page_cache_mode pcm,
>>> +   void *caller, bool encrypted)
>>>  {
>>> unsigned long offset, vaddr;
>>> resource_size_t last_addr;
>>> @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
>>> phys_addr,
>>>  * resulting mapping.
>>>  */
>>> prot = PAGE_KERNEL_IO;
>>> -   if (sev_active() && mem_flags.desc_other)
>>> +   if ((sev_active() && mem_flags.desc_other) || encrypted)
>>> prot = pgprot_encrypted(prot);
>>>  
>>> switch (pcm) {
>>> @@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t 
>>> phys_addr, unsigned long size)
>>> enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
>>>  
>>> return __ioremap_caller(phys_addr, size, pcm,
>>> -   __builtin_return_address(0));
>>> +   __builtin_return_address(0), false);
>>>  }
>>>  EXPORT_SYMBOL(ioremap_nocache);
>>>  
>>> @@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
>>> unsigned long size)
>>> enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
>>>  
>>> return __ioremap_caller(phys_addr, size, pcm,
>>> -   __builtin_return_address(0));
>>> +   __builtin_return_address(0), false);
>>>  }
>>>  EXPORT_SYMBOL_GPL(ioremap_uc);
>>>  
>>> @@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
>>>  void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
>>>  {
>>> return __ioremap_caller(phys_addr, size,

Re: [PATCH 1/4 v7] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-26 Thread lijiang
Also cc maintainer and other reviewer. Thanks.

在 2018年09月26日 14:18, lijiang 写道:
> 在 2018年09月26日 10:21, Baoquan He 写道:
>> Hi Lianbo,
>>
>> On 09/07/18 at 04:18pm, Lianbo Jiang wrote:
>>> When SME is enabled on AMD machine, the memory is encrypted in the first
>>> kernel. In this case, SME also needs to be enabled in kdump kernel, and
>>> we have to remap the old memory with the memory encryption mask.
>>
>> This patch series looks good to me. One thing is in your v5 post, Boris
>> reviewed and complained about the git log, we worked together to make an
>> document to explain, wondering why you don't rearrange it into log of
>> this patch. Other than this, all looks fine.
>>
>> http://lkml.kernel.org/r/53536964-2b57-4630-de91-3d4da2b64...@redhat.com
>>
> Thank you, Baoquan.
> 
> Previously i had considered whether i should put these explaining into patch 
> log,
> because these contents are a little more, i might just put the description of
> Solution A into this patch log and post this patch again.
> 
> Lianbo
>>
>>>
>>> Signed-off-by: Lianbo Jiang 
>>> ---
>>>  arch/x86/include/asm/io.h |  3 +++
>>>  arch/x86/mm/ioremap.c | 25 +
>>>  2 files changed, 20 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>>> index 6de64840dd22..f8795f9581c7 100644
>>> --- a/arch/x86/include/asm/io.h
>>> +++ b/arch/x86/include/asm/io.h
>>> @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t 
>>> offset, unsigned long size);
>>>  #define ioremap_cache ioremap_cache
>>>  extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long 
>>> size, unsigned long prot_val);
>>>  #define ioremap_prot ioremap_prot
>>> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
>>> +   unsigned long size);
>>> +#define ioremap_encrypted ioremap_encrypted
>>>  
>>>  /**
>>>   * ioremap -   map bus memory into CPU space
>>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>>> index c63a545ec199..e01e6c695add 100644
>>> --- a/arch/x86/mm/ioremap.c
>>> +++ b/arch/x86/mm/ioremap.c
>>> @@ -24,6 +24,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  
>>>  #include "physaddr.h"
>>>  
>>> @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
>>> unsigned long size,
>>>   * caller shouldn't need to know that small detail.
>>>   */
>>>  static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>>> -   unsigned long size, enum page_cache_mode pcm, void *caller)
>>> +   unsigned long size, enum page_cache_mode pcm,
>>> +   void *caller, bool encrypted)
>>>  {
>>> unsigned long offset, vaddr;
>>> resource_size_t last_addr;
>>> @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
>>> phys_addr,
>>>  * resulting mapping.
>>>  */
>>> prot = PAGE_KERNEL_IO;
>>> -   if (sev_active() && mem_flags.desc_other)
>>> +   if ((sev_active() && mem_flags.desc_other) || encrypted)
>>> prot = pgprot_encrypted(prot);
>>>  
>>> switch (pcm) {
>>> @@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t 
>>> phys_addr, unsigned long size)
>>> enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
>>>  
>>> return __ioremap_caller(phys_addr, size, pcm,
>>> -   __builtin_return_address(0));
>>> +   __builtin_return_address(0), false);
>>>  }
>>>  EXPORT_SYMBOL(ioremap_nocache);
>>>  
>>> @@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
>>> unsigned long size)
>>> enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
>>>  
>>> return __ioremap_caller(phys_addr, size, pcm,
>>> -   __builtin_return_address(0));
>>> +   __builtin_return_address(0), false);
>>>  }
>>>  EXPORT_SYMBOL_GPL(ioremap_uc);
>>>  
>>> @@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
>>>  void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
>>>  {
>>> return __ioremap_caller(phys_addr, size,

Re: [PATCH] resource: fix an error which walks through iomem resources

2018-09-19 Thread lijiang
Please ignore this patch, i have put this patch into another patches series, 
you can
refer to the link below:

https://lore.kernel.org/patchwork/patch/988431/
https://lore.kernel.org/patchwork/patch/988432/
https://lore.kernel.org/patchwork/patch/988433/

Thanks.

在 2018年09月17日 14:20, Lianbo Jiang 写道:
> When we walk through iomem resources by calling walk_iomem_res_desc(), the
> values of the function parameter may be modified in the while loop of __walk
> _iomem_res_desc(), which will cause us to not get the desired result in some
> cases.
> 
> At present, it only restores the original value of res->end, but it doesn't
> restore the original value of res->flags in the while loop of __walk_iomem
> _res_desc(). Whenever the find_next_iomem_res() finds a resource and returns
> the result, the original values of this resource will be modified, which might
> lead to an error in the next loop. For example:
> 
> The original value of resource flags is: res->flags=0x8200(initial value)
> 
> p->flags   _ 0x81000200 __ 0x8200 _
>   /  \  /  \
> ||___A||_._|__B_|..___|
> 0 0x
> (memory address ranges)
> 
> Note: if ((p->flags & res->flags) != res->flags) continue;
> 
> When the resource A is found, the original value of this resource flags will
> be changed to 0x81000200(res->flags=0x81000200), and continue to look for the
> next resource, when the loop reaches resource B, it can not get the resource B
> any more(you can refer to the for loop of find_next_iomem_res()), because the
> value of conditional expression will become true and will also jump the 
> resource
> B. In fact, we should get the resource A and B when we walk through the whole
> tree, but it only gets the resource A, the resource B is missed.
> 
> Signed-off-by: Lianbo Jiang 
> ---
>  kernel/resource.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 30e1bc68503b..f5d9fc70a04c 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -375,6 +375,7 @@ static int __walk_iomem_res_desc(struct resource *res, 
> unsigned long desc,
>int (*func)(struct resource *, void *))
>  {
>   u64 orig_end = res->end;
> + u64 orig_flags = res->flags;
>   int ret = -1;
>  
>   while ((res->start < res->end) &&
> @@ -385,6 +386,7 @@ static int __walk_iomem_res_desc(struct resource *res, 
> unsigned long desc,
>  
>   res->start = res->end + 1;
>   res->end = orig_end;
> + res->flags = orig_flags;
>   }
>  
>   return ret;
> 


Re: [PATCH] resource: fix an error which walks through iomem resources

2018-09-19 Thread lijiang
Please ignore this patch, i have put this patch into another patches series, 
you can
refer to the link below:

https://lore.kernel.org/patchwork/patch/988431/
https://lore.kernel.org/patchwork/patch/988432/
https://lore.kernel.org/patchwork/patch/988433/

Thanks.

在 2018年09月17日 14:20, Lianbo Jiang 写道:
> When we walk through iomem resources by calling walk_iomem_res_desc(), the
> values of the function parameter may be modified in the while loop of __walk
> _iomem_res_desc(), which will cause us to not get the desired result in some
> cases.
> 
> At present, it only restores the original value of res->end, but it doesn't
> restore the original value of res->flags in the while loop of __walk_iomem
> _res_desc(). Whenever the find_next_iomem_res() finds a resource and returns
> the result, the original values of this resource will be modified, which might
> lead to an error in the next loop. For example:
> 
> The original value of resource flags is: res->flags=0x8200(initial value)
> 
> p->flags   _ 0x81000200 __ 0x8200 _
>   /  \  /  \
> ||___A||_._|__B_|..___|
> 0 0x
> (memory address ranges)
> 
> Note: if ((p->flags & res->flags) != res->flags) continue;
> 
> When the resource A is found, the original value of this resource flags will
> be changed to 0x81000200(res->flags=0x81000200), and continue to look for the
> next resource, when the loop reaches resource B, it can not get the resource B
> any more(you can refer to the for loop of find_next_iomem_res()), because the
> value of conditional expression will become true and will also jump the 
> resource
> B. In fact, we should get the resource A and B when we walk through the whole
> tree, but it only gets the resource A, the resource B is missed.
> 
> Signed-off-by: Lianbo Jiang 
> ---
>  kernel/resource.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 30e1bc68503b..f5d9fc70a04c 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -375,6 +375,7 @@ static int __walk_iomem_res_desc(struct resource *res, 
> unsigned long desc,
>int (*func)(struct resource *, void *))
>  {
>   u64 orig_end = res->end;
> + u64 orig_flags = res->flags;
>   int ret = -1;
>  
>   while ((res->start < res->end) &&
> @@ -385,6 +386,7 @@ static int __walk_iomem_res_desc(struct resource *res, 
> unsigned long desc,
>  
>   res->start = res->end + 1;
>   res->end = orig_end;
> + res->flags = orig_flags;
>   }
>  
>   return ret;
> 


Re: [PATCH 2/2] x86/kexec_file: add reserved e820 ranges to 2nd kernel e820 table

2018-09-18 Thread lijiang
在 2018年09月18日 11:20, Dave Young 写道:
> On 09/18/18 at 10:48am, Lianbo Jiang wrote:
>> e820 reserved ranges is useful in kdump kernel, we have added this in
>> kexec-tools code.
>>
>> One reason is PCI mmconf (extended mode) requires reserved region
>> otherwise it falls back to legacy mode.
>>
>> When AMD SME kdump support, it needs to map dmi table area as unencrypted.
>> For normal boot these ranges sit in e820 reserved ranges thus the early
>> ioremap code naturally map them as unencrypted. So if we have same e820
>> reserve setup in kdump kernel then it will just work like normal kernel.
>>
>> Signed-off-by: Dave Young 
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/kernel/crash.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 3c113e6545a3..db453e9c117b 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -384,6 +384,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
>> struct boot_params *params)
>>  walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
>>  memmap_entry_callback);
>>  
>> +/* Add all reserved ranges */
>> +cmd.type = E820_TYPE_RESERVED;
>> +flags = IORESOURCE_MEM;
> 
> Lianbo, rethink about this, we will miss other io resource types if only
> match IORESOURCE_MEM here, can you redo the patch with just using "0"
> for the passing flags?
> 

This patches align on kexec-tools for e820 reserved ranges, if so, the 
kexec-tools also need to
be improved for the other type, such as IORESOURCE_IO/BUS/DMA(...), right?

I will improve these patches and post v2 tomorrow.

Thanks.

>> +walk_iomem_res_desc(IORES_DESC_NONE, flags, 0, -1, ,
>> +memmap_entry_callback);
>> +
>>  /* Add crashk_low_res region */
>>  if (crashk_low_res.end) {
>>  ei.addr = crashk_low_res.start;
>> -- 
>> 2.17.1
>>


Re: [PATCH 2/2] x86/kexec_file: add reserved e820 ranges to 2nd kernel e820 table

2018-09-18 Thread lijiang
在 2018年09月18日 11:20, Dave Young 写道:
> On 09/18/18 at 10:48am, Lianbo Jiang wrote:
>> e820 reserved ranges is useful in kdump kernel, we have added this in
>> kexec-tools code.
>>
>> One reason is PCI mmconf (extended mode) requires reserved region
>> otherwise it falls back to legacy mode.
>>
>> When AMD SME kdump support, it needs to map dmi table area as unencrypted.
>> For normal boot these ranges sit in e820 reserved ranges thus the early
>> ioremap code naturally map them as unencrypted. So if we have same e820
>> reserve setup in kdump kernel then it will just work like normal kernel.
>>
>> Signed-off-by: Dave Young 
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/kernel/crash.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 3c113e6545a3..db453e9c117b 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -384,6 +384,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
>> struct boot_params *params)
>>  walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
>>  memmap_entry_callback);
>>  
>> +/* Add all reserved ranges */
>> +cmd.type = E820_TYPE_RESERVED;
>> +flags = IORESOURCE_MEM;
> 
> Lianbo, rethink about this, we will miss other io resource types if only
> match IORESOURCE_MEM here, can you redo the patch with just using "0"
> for the passing flags?
> 

This patches align on kexec-tools for e820 reserved ranges, if so, the 
kexec-tools also need to
be improved for the other type, such as IORESOURCE_IO/BUS/DMA(...), right?

I will improve these patches and post v2 tomorrow.

Thanks.

>> +walk_iomem_res_desc(IORES_DESC_NONE, flags, 0, -1, ,
>> +memmap_entry_callback);
>> +
>>  /* Add crashk_low_res region */
>>  if (crashk_low_res.end) {
>>  ei.addr = crashk_low_res.start;
>> -- 
>> 2.17.1
>>


Re: [PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-07-03 Thread lijiang
在 2018年07月03日 17:39, Borislav Petkov 写道:
> On Tue, Jul 03, 2018 at 10:17:19AM +0800, lijiang wrote:
>> for example, the elfcorehdr. In fact, the elfcorehdr and notes
> 
> You mean this?
> 
>  ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
>  {
> -   return read_from_oldmem(buf, count, ppos, 0);
> +   return read_from_oldmem(buf, count, ppos, 0, sme_active());
> 
> That looks encrypted to me.
> 
The elf notes is an old memory, it is encrypted.

But the elf header is a crash kernel reserved memory, which is unencrypted.
 ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0);
+   return read_from_oldmem(buf, count, ppos, 0, false);
 }

They call the same function(read_from_oldmem->ioremap_cache), so it is hard to
properly remap the memory if we don't use the parameter to distinguish. 

Regards,
Lianbo
>> call the same function(read_from_oldmem->ioremap_cache), in this case,
>> it is very difficult to properly remap the memory if the caller don't
>> care whether the memory is encrypted.
> 
> So beef up the logic in __ioremap_caller() to figure out based on the
> address whether to access the memory encrypted or not. You can find out
> the elfcorehdr address in the capture kernel.
> 


Re: [PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-07-03 Thread lijiang
在 2018年07月03日 17:39, Borislav Petkov 写道:
> On Tue, Jul 03, 2018 at 10:17:19AM +0800, lijiang wrote:
>> for example, the elfcorehdr. In fact, the elfcorehdr and notes
> 
> You mean this?
> 
>  ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
>  {
> -   return read_from_oldmem(buf, count, ppos, 0);
> +   return read_from_oldmem(buf, count, ppos, 0, sme_active());
> 
> That looks encrypted to me.
> 
The elf notes is an old memory, it is encrypted.

But the elf header is a crash kernel reserved memory, which is unencrypted.
 ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0);
+   return read_from_oldmem(buf, count, ppos, 0, false);
 }

They call the same function(read_from_oldmem->ioremap_cache), so it is hard to
properly remap the memory if we don't use the parameter to distinguish. 

Regards,
Lianbo
>> call the same function(read_from_oldmem->ioremap_cache), in this case,
>> it is very difficult to properly remap the memory if the caller don't
>> care whether the memory is encrypted.
> 
> So beef up the logic in __ioremap_caller() to figure out based on the
> address whether to access the memory encrypted or not. You can find out
> the elfcorehdr address in the capture kernel.
> 


Re: [PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-07-03 Thread lijiang
在 2018年07月03日 10:17, lijiang 写道:
> 在 2018年07月02日 18:14, Borislav Petkov 写道:
>> On Mon, Jul 02, 2018 at 03:26:35PM +0800, Lianbo Jiang wrote:
>>> @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
>>> unsigned long size,
>>>   * caller shouldn't need to know that small detail.
>>>   */
>>>  static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>>> -   unsigned long size, enum page_cache_mode pcm, void *caller)
>>> +   unsigned long size, enum page_cache_mode pcm,
>>> +   void *caller, bool encrypted)
>>
>> So instead of sprinkling that @encrypted argument everywhere and then
>> setting it based on sme_active() ...
>>
>>>  {
>>> unsigned long offset, vaddr;
>>> resource_size_t last_addr;
>>> @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
>>> phys_addr,
>>>  * resulting mapping.
>>>  */
>>> prot = PAGE_KERNEL_IO;
>>> -   if (sev_active() && mem_flags.desc_other)
>>> +   if ((sev_active() && mem_flags.desc_other) || encrypted)
>>
>> ... why can't you simply do your checks:
>>
>>  sme_active() && is_kdump_kernel()
>>
>> here so that __ioremap_caller() can automatically choose the proper
>> pgprot value when ioremapping the memory in the kdump kernel?
>>
>> And this way the callers don't even have to care whether the memory is
>> encrypted or not?
>>
> Thank you, Boris. I'm very glad to read your comments. That's a good idea, 
> but it has some
> unencrypted memory in kdump mode, for example, the elfcorehdr. In fact, the 
> elfcorehdr and
> notes call the same function(read_from_oldmem->ioremap_cache), in this case, 
> it is very
> difficult to properly remap the memory if the caller don't care whether the 
> memory is encrypted.
> 
Hi, Boris,
About this, maybe I should describe the more details.
For kdump, the elf header finally use the crash kernel reserved memory, it is 
not an old memory.
When we load the crash kernel image, kexec tools will copy the elf header from 
encrypted memory
region to unencrypted memory region, we know that a page of memory that is 
marked encrypted will
be automatically decrypted when read from DRAM if SME is enabled. This 
operation make us get the
plaintext, that is to say, it becomes unencrypted data, so we need to remap the 
elfcorehdr in un-
encrypted manners in kdump kernel, but elf notes use an old memory, it is 
encrypted. That is the
real reason why we need to use the different 
functions(ioremap_cache/ioremap_encrypted).
If the elf core header is set to be encrypted, we could need to know which part 
of memory is
allocated for the elfcorehdr in kernel space and change the page attribute, 
maybe which will have
to modify another common codes and kexec-tools.

If you agree with my explanation, i will add them to patch log and also post it 
again.
Welcome to review my patches again.

Thanks.
Lianbo

> Regards,
> Lianbo>>> prot = pgprot_encrypted(prot);
>>>  
>>> switch (pcm) {
>>


Re: [PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-07-03 Thread lijiang
在 2018年07月03日 10:17, lijiang 写道:
> 在 2018年07月02日 18:14, Borislav Petkov 写道:
>> On Mon, Jul 02, 2018 at 03:26:35PM +0800, Lianbo Jiang wrote:
>>> @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
>>> unsigned long size,
>>>   * caller shouldn't need to know that small detail.
>>>   */
>>>  static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>>> -   unsigned long size, enum page_cache_mode pcm, void *caller)
>>> +   unsigned long size, enum page_cache_mode pcm,
>>> +   void *caller, bool encrypted)
>>
>> So instead of sprinkling that @encrypted argument everywhere and then
>> setting it based on sme_active() ...
>>
>>>  {
>>> unsigned long offset, vaddr;
>>> resource_size_t last_addr;
>>> @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
>>> phys_addr,
>>>  * resulting mapping.
>>>  */
>>> prot = PAGE_KERNEL_IO;
>>> -   if (sev_active() && mem_flags.desc_other)
>>> +   if ((sev_active() && mem_flags.desc_other) || encrypted)
>>
>> ... why can't you simply do your checks:
>>
>>  sme_active() && is_kdump_kernel()
>>
>> here so that __ioremap_caller() can automatically choose the proper
>> pgprot value when ioremapping the memory in the kdump kernel?
>>
>> And this way the callers don't even have to care whether the memory is
>> encrypted or not?
>>
> Thank you, Boris. I'm very glad to read your comments. That's a good idea, 
> but it has some
> unencrypted memory in kdump mode, for example, the elfcorehdr. In fact, the 
> elfcorehdr and
> notes call the same function(read_from_oldmem->ioremap_cache), in this case, 
> it is very
> difficult to properly remap the memory if the caller don't care whether the 
> memory is encrypted.
> 
Hi, Boris,
About this, maybe I should describe the more details.
For kdump, the elf header finally use the crash kernel reserved memory, it is 
not an old memory.
When we load the crash kernel image, kexec tools will copy the elf header from 
encrypted memory
region to unencrypted memory region, we know that a page of memory that is 
marked encrypted will
be automatically decrypted when read from DRAM if SME is enabled. This 
operation make us get the
plaintext, that is to say, it becomes unencrypted data, so we need to remap the 
elfcorehdr in un-
encrypted manners in kdump kernel, but elf notes use an old memory, it is 
encrypted. That is the
real reason why we need to use the different 
functions(ioremap_cache/ioremap_encrypted).
If the elf core header is set to be encrypted, we could need to know which part 
of memory is
allocated for the elfcorehdr in kernel space and change the page attribute, 
maybe which will have
to modify another common codes and kexec-tools.

If you agree with my explanation, i will add them to patch log and also post it 
again.
Welcome to review my patches again.

Thanks.
Lianbo

> Regards,
> Lianbo>>> prot = pgprot_encrypted(prot);
>>>  
>>> switch (pcm) {
>>


Re: [PATCH 0/2] support kdump for AMD secure memory encryption(sme)

2018-05-22 Thread lijiang
在 2018年05月21日 21:23, Tom Lendacky 写道:
> On 5/20/2018 10:45 PM, lijiang wrote:
>> 在 2018年05月17日 21:45, lijiang 写道:
>>> 在 2018年05月15日 21:31, Tom Lendacky 写道:
>>>> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>>>>> It is convenient to remap the old memory encrypted to the second kernel by
>>>>> calling ioremap_encrypted().
>>>>>
>>>>> When sme enabled on AMD server, we also need to support kdump. Because
>>>>> the memory is encrypted in the first kernel, we will remap the old memory
>>>>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>>>>> the second kernel, otherwise the old memory encrypted can not be 
>>>>> decrypted.
>>>>> Because simply changing the value of a C-bit on a page will not
>>>>> automatically encrypt the existing contents of a page, and any data in the
>>>>> page prior to the C-bit modification will become unintelligible. A page of
>>>>> memory that is marked encrypted will be automatically decrypted when read
>>>>> from DRAM and will be automatically encrypted when written to DRAM.
>>>>>
>>>>> For the kdump, it is necessary to distinguish whether the memory is
>>>>> encrypted. Furthermore, we should also know which part of the memory is
>>>>> encrypted or decrypted. We will appropriately remap the memory according
>>>>> to the specific situation in order to tell cpu how to deal with the
>>>>> data(encrypted or decrypted). For example, when sme enabled, if the old
>>>>> memory is encrypted, we will remap the old memory in encrypted way, which
>>>>> will automatically decrypt the old memory encrypted when we read those 
>>>>> data
>>>>> from the remapping address.
>>>>>
>>>>>  --
>>>>> | first-kernel | second-kernel | kdump support |
>>>>> |  (mem_encrypt=on|off)|   (yes|no)| 
>>>>> |--+---+---|
>>>>> | on   | on| yes   |
>>>>> | off  | off   | yes   |
>>>>> | on   | off   | no|
>>>>> | off  | on| no|
>>>>> |__|___|___|
>>>>>
>>>>> Test tools:
>>>>> makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
>>>>> commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
>>>>> Author: Lianbo Jiang <liji...@redhat.com>
>>>>> Date:   Mon May 14 17:02:40 2018 +0800
>>>>> Note: This patch can only dump vmcore in the case of SME enabled.
>>>>>
>>>>> crash-7.2.1: https://github.com/crash-utility/crash.git
>>>>> commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
>>>>> Author: Dave Anderson <ander...@redhat.com>
>>>>> Date:   Fri May 11 15:54:32 2018 -0400
>>>>>
>>>>> Test environment:
>>>>> HP ProLiant DL385Gen10 AMD EPYC 7251
>>>>> 8-Core Processor
>>>>> 32768 MB memory
>>>>> 600 GB disk space
>>>>>
>>>>> Linux 4.17-rc4:
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>>>> commit 75bc37fefc44 ("Linux 4.17-rc4")
>>>>> Author: Linus Torvalds <torva...@linux-foundation.org>
>>>>> Date:   Sun May 6 16:57:38 2018 -1000
>>>>>
>>>>> Reference:
>>>>> AMD64 Architecture Programmer's Manual
>>>>> https://support.amd.com/TechDocs/24593.pdf
>>>>>
>>>>
>>>> Have you also tested this with SEV?  It would be nice if the kdump
>>>> changes you make work with both SME and SEV.
>>>>
>>> Thank you, Tom.
>>> This is a great question, we originally plan to implement SEV in subsequent 
>>> patches, and
>>> we are also working on SEV at present.
>>> Furthermore, we have another known issue that the system can't jump into 
>>> the second kernel
>>> when SME is enabled and kaslr is disabled in kdump mode. It seems that is a 
>>> complex problems,
>>> maybe it is related to kaslr and SME, currently, i'm not sure the root 
>>> cause, but we will
>>> also plan to fix it. Can you give me any adv

Re: [PATCH 0/2] support kdump for AMD secure memory encryption(sme)

2018-05-22 Thread lijiang
在 2018年05月21日 21:23, Tom Lendacky 写道:
> On 5/20/2018 10:45 PM, lijiang wrote:
>> 在 2018年05月17日 21:45, lijiang 写道:
>>> 在 2018年05月15日 21:31, Tom Lendacky 写道:
>>>> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>>>>> It is convenient to remap the old memory encrypted to the second kernel by
>>>>> calling ioremap_encrypted().
>>>>>
>>>>> When sme enabled on AMD server, we also need to support kdump. Because
>>>>> the memory is encrypted in the first kernel, we will remap the old memory
>>>>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>>>>> the second kernel, otherwise the old memory encrypted can not be 
>>>>> decrypted.
>>>>> Because simply changing the value of a C-bit on a page will not
>>>>> automatically encrypt the existing contents of a page, and any data in the
>>>>> page prior to the C-bit modification will become unintelligible. A page of
>>>>> memory that is marked encrypted will be automatically decrypted when read
>>>>> from DRAM and will be automatically encrypted when written to DRAM.
>>>>>
>>>>> For the kdump, it is necessary to distinguish whether the memory is
>>>>> encrypted. Furthermore, we should also know which part of the memory is
>>>>> encrypted or decrypted. We will appropriately remap the memory according
>>>>> to the specific situation in order to tell cpu how to deal with the
>>>>> data(encrypted or decrypted). For example, when sme enabled, if the old
>>>>> memory is encrypted, we will remap the old memory in encrypted way, which
>>>>> will automatically decrypt the old memory encrypted when we read those 
>>>>> data
>>>>> from the remapping address.
>>>>>
>>>>>  --
>>>>> | first-kernel | second-kernel | kdump support |
>>>>> |  (mem_encrypt=on|off)|   (yes|no)| 
>>>>> |--+---+---|
>>>>> | on   | on| yes   |
>>>>> | off  | off   | yes   |
>>>>> | on   | off   | no|
>>>>> | off  | on| no|
>>>>> |__|___|___|
>>>>>
>>>>> Test tools:
>>>>> makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
>>>>> commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
>>>>> Author: Lianbo Jiang 
>>>>> Date:   Mon May 14 17:02:40 2018 +0800
>>>>> Note: This patch can only dump vmcore in the case of SME enabled.
>>>>>
>>>>> crash-7.2.1: https://github.com/crash-utility/crash.git
>>>>> commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
>>>>> Author: Dave Anderson 
>>>>> Date:   Fri May 11 15:54:32 2018 -0400
>>>>>
>>>>> Test environment:
>>>>> HP ProLiant DL385Gen10 AMD EPYC 7251
>>>>> 8-Core Processor
>>>>> 32768 MB memory
>>>>> 600 GB disk space
>>>>>
>>>>> Linux 4.17-rc4:
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>>>> commit 75bc37fefc44 ("Linux 4.17-rc4")
>>>>> Author: Linus Torvalds 
>>>>> Date:   Sun May 6 16:57:38 2018 -1000
>>>>>
>>>>> Reference:
>>>>> AMD64 Architecture Programmer's Manual
>>>>> https://support.amd.com/TechDocs/24593.pdf
>>>>>
>>>>
>>>> Have you also tested this with SEV?  It would be nice if the kdump
>>>> changes you make work with both SME and SEV.
>>>>
>>> Thank you, Tom.
>>> This is a great question, we originally plan to implement SEV in subsequent 
>>> patches, and
>>> we are also working on SEV at present.
>>> Furthermore, we have another known issue that the system can't jump into 
>>> the second kernel
>>> when SME is enabled and kaslr is disabled in kdump mode. It seems that is a 
>>> complex problems,
>>> maybe it is related to kaslr and SME, currently, i'm not sure the root 
>>> cause, but we will
>>> also plan to fix it. Can you give me any advice about this issue?
>>>
>> Based on upstream 4.17-rc5, we ha

Re: [PATCH 0/2] support kdump for AMD secure memory encryption(sme)

2018-05-20 Thread lijiang
在 2018年05月17日 21:45, lijiang 写道:
> 在 2018年05月15日 21:31, Tom Lendacky 写道:
>> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>>> It is convenient to remap the old memory encrypted to the second kernel by
>>> calling ioremap_encrypted().
>>>
>>> When sme enabled on AMD server, we also need to support kdump. Because
>>> the memory is encrypted in the first kernel, we will remap the old memory
>>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>>> the second kernel, otherwise the old memory encrypted can not be decrypted.
>>> Because simply changing the value of a C-bit on a page will not
>>> automatically encrypt the existing contents of a page, and any data in the
>>> page prior to the C-bit modification will become unintelligible. A page of
>>> memory that is marked encrypted will be automatically decrypted when read
>>> from DRAM and will be automatically encrypted when written to DRAM.
>>>
>>> For the kdump, it is necessary to distinguish whether the memory is
>>> encrypted. Furthermore, we should also know which part of the memory is
>>> encrypted or decrypted. We will appropriately remap the memory according
>>> to the specific situation in order to tell cpu how to deal with the
>>> data(encrypted or decrypted). For example, when sme enabled, if the old
>>> memory is encrypted, we will remap the old memory in encrypted way, which
>>> will automatically decrypt the old memory encrypted when we read those data
>>> from the remapping address.
>>>
>>>  --
>>> | first-kernel | second-kernel | kdump support |
>>> |  (mem_encrypt=on|off)|   (yes|no)| 
>>> |--+---+---|
>>> | on   | on| yes   |
>>> | off  | off   | yes   |
>>> | on   | off   | no|
>>> | off  | on| no|
>>> |__|___|___|
>>>
>>> Test tools:
>>> makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
>>> commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
>>> Author: Lianbo Jiang <liji...@redhat.com>
>>> Date:   Mon May 14 17:02:40 2018 +0800
>>> Note: This patch can only dump vmcore in the case of SME enabled.
>>>
>>> crash-7.2.1: https://github.com/crash-utility/crash.git
>>> commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
>>> Author: Dave Anderson <ander...@redhat.com>
>>> Date:   Fri May 11 15:54:32 2018 -0400
>>>
>>> Test environment:
>>> HP ProLiant DL385Gen10 AMD EPYC 7251
>>> 8-Core Processor
>>> 32768 MB memory
>>> 600 GB disk space
>>>
>>> Linux 4.17-rc4:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>> commit 75bc37fefc44 ("Linux 4.17-rc4")
>>> Author: Linus Torvalds <torva...@linux-foundation.org>
>>> Date:   Sun May 6 16:57:38 2018 -1000
>>>
>>> Reference:
>>> AMD64 Architecture Programmer's Manual
>>> https://support.amd.com/TechDocs/24593.pdf
>>>
>>
>> Have you also tested this with SEV?  It would be nice if the kdump
>> changes you make work with both SME and SEV.
>>
> Thank you, Tom.
> This is a great question, we originally plan to implement SEV in subsequent 
> patches, and
> we are also working on SEV at present.
> Furthermore, we have another known issue that the system can't jump into the 
> second kernel
> when SME is enabled and kaslr is disabled in kdump mode. It seems that is a 
> complex problems,
> maybe it is related to kaslr and SME, currently, i'm not sure the root cause, 
> but we will
> also plan to fix it. Can you give me any advice about this issue?
>
Based on upstream 4.17-rc5, we have recently found a new issue that the system 
can't boot to
use kexec when SME is enabled and kaslr is disabled. Have you ever run into 
this issue? 
They have similar reproduction scenarios.

CC Tom & Baoquan

Thanks.
Lianbo

> Thanks.
> Lianbo
>> Thanks,
>> Tom
>>
>>> Lianbo Jiang (2):
>>>   add a function(ioremap_encrypted) for kdump when AMD sme enabled.
>>>   support kdump when AMD secure memory encryption is active
>>>
>>>  arch/x86/include/asm/dmi.h  | 14 +-
>>>  arch/x86/include/asm/io.h   |  2 ++
>>>  arch/x86/kernel/acpi/boot.c |  8 
>>>  arch/x86/kernel/crash_dump_64.c | 27 +++
>>>  arch/x86/mm/ioremap.c   | 25 +
>>>  drivers/acpi/tables.c   | 14 +-
>>>  drivers/iommu/amd_iommu_init.c  |  9 -
>>>  fs/proc/vmcore.c| 36 +++-
>>>  include/linux/crash_dump.h  |  4 
>>>  kernel/kexec_core.c | 12 
>>>  10 files changed, 135 insertions(+), 16 deletions(-)
>>>


Re: [PATCH 0/2] support kdump for AMD secure memory encryption(sme)

2018-05-20 Thread lijiang
在 2018年05月17日 21:45, lijiang 写道:
> 在 2018年05月15日 21:31, Tom Lendacky 写道:
>> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>>> It is convenient to remap the old memory encrypted to the second kernel by
>>> calling ioremap_encrypted().
>>>
>>> When sme enabled on AMD server, we also need to support kdump. Because
>>> the memory is encrypted in the first kernel, we will remap the old memory
>>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>>> the second kernel, otherwise the old memory encrypted can not be decrypted.
>>> Because simply changing the value of a C-bit on a page will not
>>> automatically encrypt the existing contents of a page, and any data in the
>>> page prior to the C-bit modification will become unintelligible. A page of
>>> memory that is marked encrypted will be automatically decrypted when read
>>> from DRAM and will be automatically encrypted when written to DRAM.
>>>
>>> For the kdump, it is necessary to distinguish whether the memory is
>>> encrypted. Furthermore, we should also know which part of the memory is
>>> encrypted or decrypted. We will appropriately remap the memory according
>>> to the specific situation in order to tell cpu how to deal with the
>>> data(encrypted or decrypted). For example, when sme enabled, if the old
>>> memory is encrypted, we will remap the old memory in encrypted way, which
>>> will automatically decrypt the old memory encrypted when we read those data
>>> from the remapping address.
>>>
>>>  --
>>> | first-kernel | second-kernel | kdump support |
>>> |  (mem_encrypt=on|off)|   (yes|no)| 
>>> |--+---+---|
>>> | on   | on| yes   |
>>> | off  | off   | yes   |
>>> | on   | off   | no|
>>> | off  | on| no|
>>> |__|___|___|
>>>
>>> Test tools:
>>> makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
>>> commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
>>> Author: Lianbo Jiang 
>>> Date:   Mon May 14 17:02:40 2018 +0800
>>> Note: This patch can only dump vmcore in the case of SME enabled.
>>>
>>> crash-7.2.1: https://github.com/crash-utility/crash.git
>>> commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
>>> Author: Dave Anderson 
>>> Date:   Fri May 11 15:54:32 2018 -0400
>>>
>>> Test environment:
>>> HP ProLiant DL385Gen10 AMD EPYC 7251
>>> 8-Core Processor
>>> 32768 MB memory
>>> 600 GB disk space
>>>
>>> Linux 4.17-rc4:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>> commit 75bc37fefc44 ("Linux 4.17-rc4")
>>> Author: Linus Torvalds 
>>> Date:   Sun May 6 16:57:38 2018 -1000
>>>
>>> Reference:
>>> AMD64 Architecture Programmer's Manual
>>> https://support.amd.com/TechDocs/24593.pdf
>>>
>>
>> Have you also tested this with SEV?  It would be nice if the kdump
>> changes you make work with both SME and SEV.
>>
> Thank you, Tom.
> This is a great question, we originally plan to implement SEV in subsequent 
> patches, and
> we are also working on SEV at present.
> Furthermore, we have another known issue that the system can't jump into the 
> second kernel
> when SME is enabled and kaslr is disabled in kdump mode. It seems that is a 
> complex problems,
> maybe it is related to kaslr and SME, currently, i'm not sure the root cause, 
> but we will
> also plan to fix it. Can you give me any advice about this issue?
>
Based on upstream 4.17-rc5, we have recently found a new issue that the system 
can't boot to
use kexec when SME is enabled and kaslr is disabled. Have you ever run into 
this issue? 
They have similar reproduction scenarios.

CC Tom & Baoquan

Thanks.
Lianbo

> Thanks.
> Lianbo
>> Thanks,
>> Tom
>>
>>> Lianbo Jiang (2):
>>>   add a function(ioremap_encrypted) for kdump when AMD sme enabled.
>>>   support kdump when AMD secure memory encryption is active
>>>
>>>  arch/x86/include/asm/dmi.h  | 14 +-
>>>  arch/x86/include/asm/io.h   |  2 ++
>>>  arch/x86/kernel/acpi/boot.c |  8 
>>>  arch/x86/kernel/crash_dump_64.c | 27 +++
>>>  arch/x86/mm/ioremap.c   | 25 +
>>>  drivers/acpi/tables.c   | 14 +-
>>>  drivers/iommu/amd_iommu_init.c  |  9 -
>>>  fs/proc/vmcore.c| 36 +++-
>>>  include/linux/crash_dump.h  |  4 
>>>  kernel/kexec_core.c | 12 
>>>  10 files changed, 135 insertions(+), 16 deletions(-)
>>>


Re: [PATCH 0/2] support kdump for AMD secure memory encryption(sme)

2018-05-17 Thread lijiang
在 2018年05月15日 21:31, Tom Lendacky 写道:
> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>> It is convenient to remap the old memory encrypted to the second kernel by
>> calling ioremap_encrypted().
>>
>> When sme enabled on AMD server, we also need to support kdump. Because
>> the memory is encrypted in the first kernel, we will remap the old memory
>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>> the second kernel, otherwise the old memory encrypted can not be decrypted.
>> Because simply changing the value of a C-bit on a page will not
>> automatically encrypt the existing contents of a page, and any data in the
>> page prior to the C-bit modification will become unintelligible. A page of
>> memory that is marked encrypted will be automatically decrypted when read
>> from DRAM and will be automatically encrypted when written to DRAM.
>>
>> For the kdump, it is necessary to distinguish whether the memory is
>> encrypted. Furthermore, we should also know which part of the memory is
>> encrypted or decrypted. We will appropriately remap the memory according
>> to the specific situation in order to tell cpu how to deal with the
>> data(encrypted or decrypted). For example, when sme enabled, if the old
>> memory is encrypted, we will remap the old memory in encrypted way, which
>> will automatically decrypt the old memory encrypted when we read those data
>> from the remapping address.
>>
>>  --
>> | first-kernel | second-kernel | kdump support |
>> |  (mem_encrypt=on|off)|   (yes|no)| 
>> |--+---+---|
>> | on   | on| yes   |
>> | off  | off   | yes   |
>> | on   | off   | no|
>> | off  | on| no|
>> |__|___|___|
>>
>> Test tools:
>> makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
>> commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
>> Author: Lianbo Jiang 
>> Date:   Mon May 14 17:02:40 2018 +0800
>> Note: This patch can only dump vmcore in the case of SME enabled.
>>
>> crash-7.2.1: https://github.com/crash-utility/crash.git
>> commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
>> Author: Dave Anderson 
>> Date:   Fri May 11 15:54:32 2018 -0400
>>
>> Test environment:
>> HP ProLiant DL385Gen10 AMD EPYC 7251
>> 8-Core Processor
>> 32768 MB memory
>> 600 GB disk space
>>
>> Linux 4.17-rc4:
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> commit 75bc37fefc44 ("Linux 4.17-rc4")
>> Author: Linus Torvalds 
>> Date:   Sun May 6 16:57:38 2018 -1000
>>
>> Reference:
>> AMD64 Architecture Programmer's Manual
>> https://support.amd.com/TechDocs/24593.pdf
>>
> 
> Have you also tested this with SEV?  It would be nice if the kdump
> changes you make work with both SME and SEV.
> 
Thank you, Tom.
This is a great question, we originally plan to implement SEV in subsequent 
patches, and
we are also working on SEV at present.
Furthermore, we have another known issue that the system can't jump into the 
second kernel
when SME is enabled and kaslr is disabled in kdump mode. It seems that is a 
complex problems,
maybe it is related to kaslr and SME, currently, i'm not sure the root cause, 
but we will
also plan to fix it. Can you give me any advice about this issue?

Thanks.
Lianbo
> Thanks,
> Tom
> 
>> Lianbo Jiang (2):
>>   add a function(ioremap_encrypted) for kdump when AMD sme enabled.
>>   support kdump when AMD secure memory encryption is active
>>
>>  arch/x86/include/asm/dmi.h  | 14 +-
>>  arch/x86/include/asm/io.h   |  2 ++
>>  arch/x86/kernel/acpi/boot.c |  8 
>>  arch/x86/kernel/crash_dump_64.c | 27 +++
>>  arch/x86/mm/ioremap.c   | 25 +
>>  drivers/acpi/tables.c   | 14 +-
>>  drivers/iommu/amd_iommu_init.c  |  9 -
>>  fs/proc/vmcore.c| 36 +++-
>>  include/linux/crash_dump.h  |  4 
>>  kernel/kexec_core.c | 12 
>>  10 files changed, 135 insertions(+), 16 deletions(-)
>>


Re: [PATCH 0/2] support kdump for AMD secure memory encryption(sme)

2018-05-17 Thread lijiang
在 2018年05月15日 21:31, Tom Lendacky 写道:
> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>> It is convenient to remap the old memory encrypted to the second kernel by
>> calling ioremap_encrypted().
>>
>> When sme enabled on AMD server, we also need to support kdump. Because
>> the memory is encrypted in the first kernel, we will remap the old memory
>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>> the second kernel, otherwise the old memory encrypted can not be decrypted.
>> Because simply changing the value of a C-bit on a page will not
>> automatically encrypt the existing contents of a page, and any data in the
>> page prior to the C-bit modification will become unintelligible. A page of
>> memory that is marked encrypted will be automatically decrypted when read
>> from DRAM and will be automatically encrypted when written to DRAM.
>>
>> For the kdump, it is necessary to distinguish whether the memory is
>> encrypted. Furthermore, we should also know which part of the memory is
>> encrypted or decrypted. We will appropriately remap the memory according
>> to the specific situation in order to tell cpu how to deal with the
>> data(encrypted or decrypted). For example, when sme enabled, if the old
>> memory is encrypted, we will remap the old memory in encrypted way, which
>> will automatically decrypt the old memory encrypted when we read those data
>> from the remapping address.
>>
>>  --
>> | first-kernel | second-kernel | kdump support |
>> |  (mem_encrypt=on|off)|   (yes|no)| 
>> |--+---+---|
>> | on   | on| yes   |
>> | off  | off   | yes   |
>> | on   | off   | no|
>> | off  | on| no|
>> |__|___|___|
>>
>> Test tools:
>> makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
>> commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
>> Author: Lianbo Jiang 
>> Date:   Mon May 14 17:02:40 2018 +0800
>> Note: This patch can only dump vmcore in the case of SME enabled.
>>
>> crash-7.2.1: https://github.com/crash-utility/crash.git
>> commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
>> Author: Dave Anderson 
>> Date:   Fri May 11 15:54:32 2018 -0400
>>
>> Test environment:
>> HP ProLiant DL385Gen10 AMD EPYC 7251
>> 8-Core Processor
>> 32768 MB memory
>> 600 GB disk space
>>
>> Linux 4.17-rc4:
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> commit 75bc37fefc44 ("Linux 4.17-rc4")
>> Author: Linus Torvalds 
>> Date:   Sun May 6 16:57:38 2018 -1000
>>
>> Reference:
>> AMD64 Architecture Programmer's Manual
>> https://support.amd.com/TechDocs/24593.pdf
>>
> 
> Have you also tested this with SEV?  It would be nice if the kdump
> changes you make work with both SME and SEV.
> 
Thank you, Tom.
This is a great question, we originally plan to implement SEV in subsequent 
patches, and
we are also working on SEV at present.
Furthermore, we have another known issue that the system can't jump into the 
second kernel
when SME is enabled and kaslr is disabled in kdump mode. It seems that is a 
complex problems,
maybe it is related to kaslr and SME, currently, i'm not sure the root cause, 
but we will
also plan to fix it. Can you give me any advice about this issue?

Thanks.
Lianbo
> Thanks,
> Tom
> 
>> Lianbo Jiang (2):
>>   add a function(ioremap_encrypted) for kdump when AMD sme enabled.
>>   support kdump when AMD secure memory encryption is active
>>
>>  arch/x86/include/asm/dmi.h  | 14 +-
>>  arch/x86/include/asm/io.h   |  2 ++
>>  arch/x86/kernel/acpi/boot.c |  8 
>>  arch/x86/kernel/crash_dump_64.c | 27 +++
>>  arch/x86/mm/ioremap.c   | 25 +
>>  drivers/acpi/tables.c   | 14 +-
>>  drivers/iommu/amd_iommu_init.c  |  9 -
>>  fs/proc/vmcore.c| 36 +++-
>>  include/linux/crash_dump.h  |  4 
>>  kernel/kexec_core.c | 12 
>>  10 files changed, 135 insertions(+), 16 deletions(-)
>>


Re: [PATCH 2/2] support kdump when AMD secure memory encryption is active

2018-05-16 Thread lijiang
在 2018年05月16日 04:18, Tom Lendacky 写道:
> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>> When sme enabled on AMD server, we also need to support kdump. Because
>> the memory is encrypted in the first kernel, we will remap the old memory
>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>> the second kernel, otherwise the old memory encrypted can not be decrypted.
>> Because simply changing the value of a C-bit on a page will not
>> automatically encrypt the existing contents of a page, and any data in the
>> page prior to the C-bit modification will become unintelligible. A page of
>> memory that is marked encrypted will be automatically decrypted when read
>> from DRAM and will be automatically encrypted when written to DRAM.
>>
>> For the kdump, it is necessary to distinguish whether the memory is
>> encrypted. Furthermore, we should also know which part of the memory is
>> encrypted or decrypted. We will appropriately remap the memory according
>> to the specific situation in order to tell cpu how to deal with the data(
>> encrypted or unencrypted). For example, when sme enabled, if the old memory
>> is encrypted, we will remap the old memory in encrypted way, which will
>> automatically decrypt the old memory encrypted when we read those data from
>> the remapping address.
>>
>>  --
>> | first-kernel | second-kernel | kdump support |
>> |  (mem_encrypt=on|off)|   (yes|no)|
>> |--+---+---|
>> | on   | on| yes   |
>> | off  | off   | yes   |
>> | on   | off   | no|
>> | off  | on| no|
>> |__|___|___|
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/include/asm/dmi.h  | 14 +-
>>  arch/x86/kernel/acpi/boot.c |  8 
>>  arch/x86/kernel/crash_dump_64.c | 27 +++
>>  drivers/acpi/tables.c   | 14 +-
>>  drivers/iommu/amd_iommu_init.c  |  9 -
>>  fs/proc/vmcore.c| 36 +++-
>>  include/linux/crash_dump.h  |  4 
>>  kernel/kexec_core.c | 12 
>>  8 files changed, 116 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
>> index 0ab2ab2..a5663b4 100644
>> --- a/arch/x86/include/asm/dmi.h
>> +++ b/arch/x86/include/asm/dmi.h
>> @@ -7,6 +7,10 @@
>>  
>>  #include 
>>  #include 
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> 
> I don't think you need all of the #ifdef stuff throughout this
> patch.  Everything should work just fine without it.
> 
>> +#include 
>> +#include 
>> +#endif
>>  
>>  static __always_inline __init void *dmi_alloc(unsigned len)
>>  {
>> @@ -14,7 +18,15 @@ static __always_inline __init void *dmi_alloc(unsigned 
>> len)
>>  }
>>  
>>  /* Use early IO mappings for DMI because it's initialized early */
>> -#define dmi_early_remap early_memremap
>> +static __always_inline __init void *dmi_early_remap(resource_size_t
>> +phys_addr, unsigned long size)
>> +{
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> 
> Again, no need for the #ifdef here.  You should probably audit the
> code for all of these and truly determine if they are really needed.
> 
>> +if (sme_active() && is_kdump_kernel())
> 
> Use of sme_active() here is good since under SEV, this area will be
> encrypted.
> 
>> +return early_memremap_decrypted(phys_addr, size);
>> +#endif
>> +return early_memremap(phys_addr, size);
> 
> Instead of doing this, maybe it makes more sense to put this logic
> somewhere in the early_memremap() path.  Possibly smarten up the
> early_memremap_pgprot_adjust() function with some kdump kernel
> related logic.  Not sure it's possible, but would be nice since you
> have this logic in a couple of places.
> 
>> +}
>>  #define dmi_early_unmap early_memunmap
>>  #define dmi_remap(_x, _l)   memremap(_x, _l, MEMREMAP_WB)
>>  #define dmi_unmap(_x)   memunmap(_x)
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>> index 3b20607..354ad66 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -48,6 +48,10 @@
>>  #include 
>>  #include 
>>  #include 
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +#include 
>> +#include 
>> +#endif
>>  
>>  #include "sleep.h" /* To include x86_acpi_suspend_lowlevel */
>>  static int __initdata acpi_force = 0;
>> @@ -124,6 +128,10 @@ void __init __iomem *__acpi_map_table(unsigned long 
>> phys, unsigned long size)
>>  if (!phys || !size)
>>  return NULL;
>>  
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +if (sme_active() && is_kdump_kernel())
>> +return early_memremap_decrypted(phys, size);
>> +#endif
> 
> Same as previous comment(s).
> 
>>  return 

Re: [PATCH 2/2] support kdump when AMD secure memory encryption is active

2018-05-16 Thread lijiang
在 2018年05月16日 04:18, Tom Lendacky 写道:
> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>> When sme enabled on AMD server, we also need to support kdump. Because
>> the memory is encrypted in the first kernel, we will remap the old memory
>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>> the second kernel, otherwise the old memory encrypted can not be decrypted.
>> Because simply changing the value of a C-bit on a page will not
>> automatically encrypt the existing contents of a page, and any data in the
>> page prior to the C-bit modification will become unintelligible. A page of
>> memory that is marked encrypted will be automatically decrypted when read
>> from DRAM and will be automatically encrypted when written to DRAM.
>>
>> For the kdump, it is necessary to distinguish whether the memory is
>> encrypted. Furthermore, we should also know which part of the memory is
>> encrypted or decrypted. We will appropriately remap the memory according
>> to the specific situation in order to tell cpu how to deal with the data(
>> encrypted or unencrypted). For example, when sme enabled, if the old memory
>> is encrypted, we will remap the old memory in encrypted way, which will
>> automatically decrypt the old memory encrypted when we read those data from
>> the remapping address.
>>
>>  --
>> | first-kernel | second-kernel | kdump support |
>> |  (mem_encrypt=on|off)|   (yes|no)|
>> |--+---+---|
>> | on   | on| yes   |
>> | off  | off   | yes   |
>> | on   | off   | no|
>> | off  | on| no|
>> |__|___|___|
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/include/asm/dmi.h  | 14 +-
>>  arch/x86/kernel/acpi/boot.c |  8 
>>  arch/x86/kernel/crash_dump_64.c | 27 +++
>>  drivers/acpi/tables.c   | 14 +-
>>  drivers/iommu/amd_iommu_init.c  |  9 -
>>  fs/proc/vmcore.c| 36 +++-
>>  include/linux/crash_dump.h  |  4 
>>  kernel/kexec_core.c | 12 
>>  8 files changed, 116 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
>> index 0ab2ab2..a5663b4 100644
>> --- a/arch/x86/include/asm/dmi.h
>> +++ b/arch/x86/include/asm/dmi.h
>> @@ -7,6 +7,10 @@
>>  
>>  #include 
>>  #include 
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> 
> I don't think you need all of the #ifdef stuff throughout this
> patch.  Everything should work just fine without it.
> 
>> +#include 
>> +#include 
>> +#endif
>>  
>>  static __always_inline __init void *dmi_alloc(unsigned len)
>>  {
>> @@ -14,7 +18,15 @@ static __always_inline __init void *dmi_alloc(unsigned 
>> len)
>>  }
>>  
>>  /* Use early IO mappings for DMI because it's initialized early */
>> -#define dmi_early_remap early_memremap
>> +static __always_inline __init void *dmi_early_remap(resource_size_t
>> +phys_addr, unsigned long size)
>> +{
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> 
> Again, no need for the #ifdef here.  You should probably audit the
> code for all of these and truly determine if they are really needed.
> 
>> +if (sme_active() && is_kdump_kernel())
> 
> Use of sme_active() here is good since under SEV, this area will be
> encrypted.
> 
>> +return early_memremap_decrypted(phys_addr, size);
>> +#endif
>> +return early_memremap(phys_addr, size);
> 
> Instead of doing this, maybe it makes more sense to put this logic
> somewhere in the early_memremap() path.  Possibly smarten up the
> early_memremap_pgprot_adjust() function with some kdump kernel
> related logic.  Not sure it's possible, but would be nice since you
> have this logic in a couple of places.
> 
>> +}
>>  #define dmi_early_unmap early_memunmap
>>  #define dmi_remap(_x, _l)   memremap(_x, _l, MEMREMAP_WB)
>>  #define dmi_unmap(_x)   memunmap(_x)
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>> index 3b20607..354ad66 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -48,6 +48,10 @@
>>  #include 
>>  #include 
>>  #include 
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +#include 
>> +#include 
>> +#endif
>>  
>>  #include "sleep.h" /* To include x86_acpi_suspend_lowlevel */
>>  static int __initdata acpi_force = 0;
>> @@ -124,6 +128,10 @@ void __init __iomem *__acpi_map_table(unsigned long 
>> phys, unsigned long size)
>>  if (!phys || !size)
>>  return NULL;
>>  
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +if (sme_active() && is_kdump_kernel())
>> +return early_memremap_decrypted(phys, size);
>> +#endif
> 
> Same as previous comment(s).
> 
>>  return early_memremap(phys, size);
>>  }

Re: [PATCH 2/2] support kdump when AMD secure memory encryption is active

2018-05-16 Thread lijiang
在 2018年05月16日 04:18, Tom Lendacky 写道:
> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>> When sme enabled on AMD server, we also need to support kdump. Because
>> the memory is encrypted in the first kernel, we will remap the old memory
>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>> the second kernel, otherwise the old memory encrypted can not be decrypted.
>> Because simply changing the value of a C-bit on a page will not
>> automatically encrypt the existing contents of a page, and any data in the
>> page prior to the C-bit modification will become unintelligible. A page of
>> memory that is marked encrypted will be automatically decrypted when read
>> from DRAM and will be automatically encrypted when written to DRAM.
>>
>> For the kdump, it is necessary to distinguish whether the memory is
>> encrypted. Furthermore, we should also know which part of the memory is
>> encrypted or decrypted. We will appropriately remap the memory according
>> to the specific situation in order to tell cpu how to deal with the data(
>> encrypted or unencrypted). For example, when sme enabled, if the old memory
>> is encrypted, we will remap the old memory in encrypted way, which will
>> automatically decrypt the old memory encrypted when we read those data from
>> the remapping address.
>>
>>  --
>> | first-kernel | second-kernel | kdump support |
>> |  (mem_encrypt=on|off)|   (yes|no)|
>> |--+---+---|
>> | on   | on| yes   |
>> | off  | off   | yes   |
>> | on   | off   | no|
>> | off  | on| no|
>> |__|___|___|
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/include/asm/dmi.h  | 14 +-
>>  arch/x86/kernel/acpi/boot.c |  8 
>>  arch/x86/kernel/crash_dump_64.c | 27 +++
>>  drivers/acpi/tables.c   | 14 +-
>>  drivers/iommu/amd_iommu_init.c  |  9 -
>>  fs/proc/vmcore.c| 36 +++-
>>  include/linux/crash_dump.h  |  4 
>>  kernel/kexec_core.c | 12 
>>  8 files changed, 116 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
>> index 0ab2ab2..a5663b4 100644
>> --- a/arch/x86/include/asm/dmi.h
>> +++ b/arch/x86/include/asm/dmi.h
>> @@ -7,6 +7,10 @@
>>  
>>  #include 
>>  #include 
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> 
> I don't think you need all of the #ifdef stuff throughout this
> patch.  Everything should work just fine without it.
> 
Thanks Tom. The macro will be deleted from this patch.
Thanks.

Lianbo
>> +#include 
>> +#include 
>> +#endif
>>  
>>  static __always_inline __init void *dmi_alloc(unsigned len)
>>  {
>> @@ -14,7 +18,15 @@ static __always_inline __init void *dmi_alloc(unsigned 
>> len)
>>  }
>>  
>>  /* Use early IO mappings for DMI because it's initialized early */
>> -#define dmi_early_remap early_memremap
>> +static __always_inline __init void *dmi_early_remap(resource_size_t
>> +phys_addr, unsigned long size)
>> +{
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> 
> Again, no need for the #ifdef here.  You should probably audit the
> code for all of these and truly determine if they are really needed.
> 
It will be fixed in the patch v2.
Thanks.
>> +if (sme_active() && is_kdump_kernel())
> 
> Use of sme_active() here is good since under SEV, this area will be
> encrypted.
> 
>> +return early_memremap_decrypted(phys_addr, size);
>> +#endif
>> +return early_memremap(phys_addr, size);
> 
> Instead of doing this, maybe it makes more sense to put this logic
> somewhere in the early_memremap() path.  Possibly smarten up the
> early_memremap_pgprot_adjust() function with some kdump kernel
> related logic.  Not sure it's possible, but would be nice since you
> have this logic in a couple of places.
> 
Good idea. If we put this logic into the early_memremap path, there are
many codes that will not have to be modified, for example, dmi, acpi.

Thanks.
>> +}
>>  #define dmi_early_unmap early_memunmap
>>  #define dmi_remap(_x, _l)   memremap(_x, _l, MEMREMAP_WB)
>>  #define dmi_unmap(_x)   memunmap(_x)
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>> index 3b20607..354ad66 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -48,6 +48,10 @@
>>  #include 
>>  #include 
>>  #include 
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +#include 
>> +#include 
>> +#endif
>>  
>>  #include "sleep.h" /* To include x86_acpi_suspend_lowlevel */
>>  static int __initdata acpi_force = 0;
>> @@ -124,6 +128,10 @@ void __init __iomem *__acpi_map_table(unsigned long 
>> phys, unsigned long size)
>>  if 

Re: [PATCH 2/2] support kdump when AMD secure memory encryption is active

2018-05-16 Thread lijiang
在 2018年05月16日 04:18, Tom Lendacky 写道:
> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>> When sme enabled on AMD server, we also need to support kdump. Because
>> the memory is encrypted in the first kernel, we will remap the old memory
>> encrypted to the second kernel(crash kernel), and sme is also enabled in
>> the second kernel, otherwise the old memory encrypted can not be decrypted.
>> Because simply changing the value of a C-bit on a page will not
>> automatically encrypt the existing contents of a page, and any data in the
>> page prior to the C-bit modification will become unintelligible. A page of
>> memory that is marked encrypted will be automatically decrypted when read
>> from DRAM and will be automatically encrypted when written to DRAM.
>>
>> For the kdump, it is necessary to distinguish whether the memory is
>> encrypted. Furthermore, we should also know which part of the memory is
>> encrypted or decrypted. We will appropriately remap the memory according
>> to the specific situation in order to tell cpu how to deal with the data(
>> encrypted or unencrypted). For example, when sme enabled, if the old memory
>> is encrypted, we will remap the old memory in encrypted way, which will
>> automatically decrypt the old memory encrypted when we read those data from
>> the remapping address.
>>
>>  --
>> | first-kernel | second-kernel | kdump support |
>> |  (mem_encrypt=on|off)|   (yes|no)|
>> |--+---+---|
>> | on   | on| yes   |
>> | off  | off   | yes   |
>> | on   | off   | no|
>> | off  | on| no|
>> |__|___|___|
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/include/asm/dmi.h  | 14 +-
>>  arch/x86/kernel/acpi/boot.c |  8 
>>  arch/x86/kernel/crash_dump_64.c | 27 +++
>>  drivers/acpi/tables.c   | 14 +-
>>  drivers/iommu/amd_iommu_init.c  |  9 -
>>  fs/proc/vmcore.c| 36 +++-
>>  include/linux/crash_dump.h  |  4 
>>  kernel/kexec_core.c | 12 
>>  8 files changed, 116 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h
>> index 0ab2ab2..a5663b4 100644
>> --- a/arch/x86/include/asm/dmi.h
>> +++ b/arch/x86/include/asm/dmi.h
>> @@ -7,6 +7,10 @@
>>  
>>  #include 
>>  #include 
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> 
> I don't think you need all of the #ifdef stuff throughout this
> patch.  Everything should work just fine without it.
> 
Thanks Tom. The macro will be deleted from this patch.
Thanks.

Lianbo
>> +#include 
>> +#include 
>> +#endif
>>  
>>  static __always_inline __init void *dmi_alloc(unsigned len)
>>  {
>> @@ -14,7 +18,15 @@ static __always_inline __init void *dmi_alloc(unsigned 
>> len)
>>  }
>>  
>>  /* Use early IO mappings for DMI because it's initialized early */
>> -#define dmi_early_remap early_memremap
>> +static __always_inline __init void *dmi_early_remap(resource_size_t
>> +phys_addr, unsigned long size)
>> +{
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> 
> Again, no need for the #ifdef here.  You should probably audit the
> code for all of these and truly determine if they are really needed.
> 
It will be fixed in the patch v2.
Thanks.
>> +if (sme_active() && is_kdump_kernel())
> 
> Use of sme_active() here is good since under SEV, this area will be
> encrypted.
> 
>> +return early_memremap_decrypted(phys_addr, size);
>> +#endif
>> +return early_memremap(phys_addr, size);
> 
> Instead of doing this, maybe it makes more sense to put this logic
> somewhere in the early_memremap() path.  Possibly smarten up the
> early_memremap_pgprot_adjust() function with some kdump kernel
> related logic.  Not sure it's possible, but would be nice since you
> have this logic in a couple of places.
> 
Good idea. If we put this logic into the early_memremap path, there are
many codes that will not have to be modified, for example, dmi, acpi.

Thanks.
>> +}
>>  #define dmi_early_unmap early_memunmap
>>  #define dmi_remap(_x, _l)   memremap(_x, _l, MEMREMAP_WB)
>>  #define dmi_unmap(_x)   memunmap(_x)
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>> index 3b20607..354ad66 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -48,6 +48,10 @@
>>  #include 
>>  #include 
>>  #include 
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +#include 
>> +#include 
>> +#endif
>>  
>>  #include "sleep.h" /* To include x86_acpi_suspend_lowlevel */
>>  static int __initdata acpi_force = 0;
>> @@ -124,6 +128,10 @@ void __init __iomem *__acpi_map_table(unsigned long 
>> phys, unsigned long size)
>>  if (!phys || !size)
>>   

Re: [PATCH 1/2] add a function(ioremap_encrypted) for kdump when AMD sme enabled.

2018-05-16 Thread lijiang


在 2018年05月15日 22:34, Tom Lendacky 写道:
> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>> It is convenient to remap the old memory encrypted to the second kernel
>> by calling ioremap_encrypted().
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/include/asm/io.h |  2 ++
>>  arch/x86/mm/ioremap.c | 25 +
>>  2 files changed, 19 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index f6e5b93..06d2a9f 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -192,6 +192,8 @@ extern void __iomem *ioremap_cache(resource_size_t 
>> offset, unsigned long size);
>>  #define ioremap_cache ioremap_cache
>>  extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long 
>> size, unsigned long prot_val);
>>  #define ioremap_prot ioremap_prot
>> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned 
>> long size);
>> +#define ioremap_encrypted ioremap_encrypted
>>  
>>  /**
>>   * ioremap -   map bus memory into CPU space
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index c63a545..7a52d1e 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -131,7 +131,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
>> unsigned long size,
>>   * caller shouldn't need to know that small detail.
>>   */
>>  static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>> -unsigned long size, enum page_cache_mode pcm, void *caller)
>> +unsigned long size, enum page_cache_mode pcm,
>> +void *caller, bool encrypted)
>>  {
>>  unsigned long offset, vaddr;
>>  resource_size_t last_addr;
>> @@ -199,7 +200,8 @@ static void __iomem *__ioremap_caller(resource_size_t 
>> phys_addr,
>>   * resulting mapping.
>>   */
>>  prot = PAGE_KERNEL_IO;
>> -if (sev_active() && mem_flags.desc_other)
>> +if ((sev_active() && mem_flags.desc_other) ||
>> +(encrypted && sme_active()))
> 
> You only need the encrypted check here.  If sme is not active,
> then the pgprot_encrypted will basically be a no-op.
> 
Great! Thank you, Tom.
It will be fixed.

Lianbo
> Also, extra indents.
> 
> Thanks,
> Tom
> 
>>  prot = pgprot_encrypted(prot);
>>  
>>  switch (pcm) {
>> @@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
>> unsigned long size)
>>  enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
>>  
>>  return __ioremap_caller(phys_addr, size, pcm,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_nocache);
>>  
>> @@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
>> unsigned long size)
>>  enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
>>  
>>  return __ioremap_caller(phys_addr, size, pcm,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL_GPL(ioremap_uc);
>>  
>> @@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
>>  void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
>>  {
>>  return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_wc);
>>  
>> @@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
>>  void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
>>  {
>>  return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_wt);
>>  
>> +void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long 
>> size)
>> +{
>> +return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
>> +__builtin_return_address(0), true);
>> +}
>> +EXPORT_SYMBOL(ioremap_encrypted);
>> +
>>  void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
>>  {
>>  return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_cache);
>>  
>> @@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, 
>> unsigned long size,
>>  {
>>  return __ioremap_caller(phys_addr, size,
>>  pgprot2cachemode(__pgprot(prot_val)),
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_prot);
>>  
>>


Re: [PATCH 1/2] add a function(ioremap_encrypted) for kdump when AMD sme enabled.

2018-05-16 Thread lijiang


在 2018年05月15日 22:34, Tom Lendacky 写道:
> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
>> It is convenient to remap the old memory encrypted to the second kernel
>> by calling ioremap_encrypted().
>>
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/include/asm/io.h |  2 ++
>>  arch/x86/mm/ioremap.c | 25 +
>>  2 files changed, 19 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index f6e5b93..06d2a9f 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -192,6 +192,8 @@ extern void __iomem *ioremap_cache(resource_size_t 
>> offset, unsigned long size);
>>  #define ioremap_cache ioremap_cache
>>  extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long 
>> size, unsigned long prot_val);
>>  #define ioremap_prot ioremap_prot
>> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned 
>> long size);
>> +#define ioremap_encrypted ioremap_encrypted
>>  
>>  /**
>>   * ioremap -   map bus memory into CPU space
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index c63a545..7a52d1e 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -131,7 +131,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
>> unsigned long size,
>>   * caller shouldn't need to know that small detail.
>>   */
>>  static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>> -unsigned long size, enum page_cache_mode pcm, void *caller)
>> +unsigned long size, enum page_cache_mode pcm,
>> +void *caller, bool encrypted)
>>  {
>>  unsigned long offset, vaddr;
>>  resource_size_t last_addr;
>> @@ -199,7 +200,8 @@ static void __iomem *__ioremap_caller(resource_size_t 
>> phys_addr,
>>   * resulting mapping.
>>   */
>>  prot = PAGE_KERNEL_IO;
>> -if (sev_active() && mem_flags.desc_other)
>> +if ((sev_active() && mem_flags.desc_other) ||
>> +(encrypted && sme_active()))
> 
> You only need the encrypted check here.  If sme is not active,
> then the pgprot_encrypted will basically be a no-op.
> 
Great! Thank you, Tom.
It will be fixed.

Lianbo
> Also, extra indents.
> 
> Thanks,
> Tom
> 
>>  prot = pgprot_encrypted(prot);
>>  
>>  switch (pcm) {
>> @@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
>> unsigned long size)
>>  enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
>>  
>>  return __ioremap_caller(phys_addr, size, pcm,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_nocache);
>>  
>> @@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
>> unsigned long size)
>>  enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
>>  
>>  return __ioremap_caller(phys_addr, size, pcm,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL_GPL(ioremap_uc);
>>  
>> @@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
>>  void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
>>  {
>>  return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_wc);
>>  
>> @@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
>>  void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
>>  {
>>  return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_wt);
>>  
>> +void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long 
>> size)
>> +{
>> +return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
>> +__builtin_return_address(0), true);
>> +}
>> +EXPORT_SYMBOL(ioremap_encrypted);
>> +
>>  void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
>>  {
>>  return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_cache);
>>  
>> @@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, 
>> unsigned long size,
>>  {
>>  return __ioremap_caller(phys_addr, size,
>>  pgprot2cachemode(__pgprot(prot_val)),
>> -__builtin_return_address(0));
>> +__builtin_return_address(0), false);
>>  }
>>  EXPORT_SYMBOL(ioremap_prot);
>>  
>>