RE: [RFC PATCH v2 0/3] pmem memmap dump support

2023-04-28 Thread Dan Williams
Li Zhijian wrote:
> Hello folks,
> 
> About 2 months ago, we posted our first RFC[3] and received your kindly 
> feedback. Thank you :)
> Now, I'm back with the code.
> 
> Currently, this RFC has already implemented to supported case D*. And the 
> case A&B is disabled
> deliberately in makedumpfile. It includes changes in 3 source code as below:

I think the reason this patchkit is difficult to follow is that it
spends a lot of time describing a chosen solution, but not enough time
describing the problem and the tradeoffs.

For example why is updating /proc/vmcore with pmem metadata the chosen
solution? Why not leave the kernel out of it and have makedumpfile
tooling aware of how to parse persistent memory namespace info-blocks
and retrieve that dump itself? This is what I proposed here:

http://lore.kernel.org/r/641484f7ef780_a52e2...@dwillia2-mobl3.amr.corp.intel.com.notmuch

...but never got an answer, or I missed the answer.



Re: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support

2023-04-28 Thread Hari Bathini



On 28/04/23 2:55 pm, Baoquan He wrote:

On 04/27/23 at 10:26pm, Hari Bathini wrote:

On 27/04/23 2:19 pm, Baoquan He wrote:

On 04/27/23 at 12:39pm, Hari Bathini wrote:

Hi Eric,

On 04/04/23 11:33 pm, Eric DeVolder wrote:

When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

The segment containing the elfcorehdr is identified at run-time
in crash_core:crash_handle_hotplug_event(), which works for both
the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
is generated from the available CPUs and memory into a buffer,
and then installed over the top of the existing elfcorehdr.

In the patch 'kexec: exclude elfcorehdr from the segment digest'
the need to update purgatory due to the change in elfcorehdr was
eliminated.  As a result, no changes to purgatory or boot_params
(as the elfcorehdr= kernel command line parameter pointer
remains unchanged and correct) are needed, just elfcorehdr.

To accommodate a growing number of resources via hotplug, the
elfcorehdr segment must be sufficiently large enough to accommodate
changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
only on the kexec_file_load() syscall; for kexec_load() userspace
will need to size the segment similarly.

To accommodate kexec_load() syscall in the absence of


Firstly, thanks! This series is a nice improvement to kdump support
in hotplug environment.

One concern though is that this change assumes corresponding support
in kexec-tools. Without that support kexec_load would fail to boot
with digest verification failure, iiuc.


Eric has posted patchset to modify kexec_tools to support that, please
see the link Eric pasted in the cover letter.

http://lists.infradead.org/pipermail/kexec/2022-October/026032.html


Right, Baoquan.

I did see that and if I read the code correctly, without that patchset
kexec_load would fail. Not with an explicit error that hotplug support
is missing or such but it would simply fail to boot into capture kernel
with digest verification failure.

My suggestion was to avoid that userspace tool breakage for older
kexec-tools version by introducing a new kexec flag that can tell
kernel that kexec-tools is ready to use this in-kernel update support.
So, if kexec_load happens without the flag, avoid doing an in-kernel
update on hotplug. I hope that clears the confusion.


Yeah, sounds like a good idea. It may be extended in later patch.


Fixing it in this series itself would be a cleaner way, I guess.

Thanks
Hari

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support

2023-04-28 Thread Baoquan He
On 04/27/23 at 10:26pm, Hari Bathini wrote:
> On 27/04/23 2:19 pm, Baoquan He wrote:
> > On 04/27/23 at 12:39pm, Hari Bathini wrote:
> > > Hi Eric,
> > > 
> > > On 04/04/23 11:33 pm, Eric DeVolder wrote:
> > > > When CPU or memory is hot un/plugged, or off/onlined, the crash
> > > > elfcorehdr, which describes the CPUs and memory in the system,
> > > > must also be updated.
> > > > 
> > > > The segment containing the elfcorehdr is identified at run-time
> > > > in crash_core:crash_handle_hotplug_event(), which works for both
> > > > the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
> > > > is generated from the available CPUs and memory into a buffer,
> > > > and then installed over the top of the existing elfcorehdr.
> > > > 
> > > > In the patch 'kexec: exclude elfcorehdr from the segment digest'
> > > > the need to update purgatory due to the change in elfcorehdr was
> > > > eliminated.  As a result, no changes to purgatory or boot_params
> > > > (as the elfcorehdr= kernel command line parameter pointer
> > > > remains unchanged and correct) are needed, just elfcorehdr.
> > > > 
> > > > To accommodate a growing number of resources via hotplug, the
> > > > elfcorehdr segment must be sufficiently large enough to accommodate
> > > > changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
> > > > only on the kexec_file_load() syscall; for kexec_load() userspace
> > > > will need to size the segment similarly.
> > > > 
> > > > To accommodate kexec_load() syscall in the absence of
> > > 
> > > Firstly, thanks! This series is a nice improvement to kdump support
> > > in hotplug environment.
> > > 
> > > One concern though is that this change assumes corresponding support
> > > in kexec-tools. Without that support kexec_load would fail to boot
> > > with digest verification failure, iiuc.
> > 
> > Eric has posted patchset to modify kexec_tools to support that, please
> > see the link Eric pasted in the cover letter.
> > 
> > http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
> 
> Right, Baoquan.
> 
> I did see that and if I read the code correctly, without that patchset
> kexec_load would fail. Not with an explicit error that hotplug support
> is missing or such but it would simply fail to boot into capture kernel
> with digest verification failure.
> 
> My suggestion was to avoid that userspace tool breakage for older
> kexec-tools version by introducing a new kexec flag that can tell
> kernel that kexec-tools is ready to use this in-kernel update support.
> So, if kexec_load happens without the flag, avoid doing an in-kernel
> update on hotplug. I hope that clears the confusion.

Yeah, sounds like a good idea. It may be extended in later patch.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC PATCH v2 3/3] resource, crash: Make kexec_file_load support pmem

2023-04-28 Thread Zhijian Li (Fujitsu)
Greg,

Sorry for these *BAD* changelog, This patch is most like a *HACKing* to 
resource.c currently.
Please allow me to rewrite it once.

Only the region described by PT_LOADs of /proc/vmcore are dumpable/readble by 
dumping applications.
Previously, on x86/x86_64 only system ram resources will be injected into 
PT_LOADs.

So in order to make the entire pmem resource is dumpable/readable, we need to 
add pmem region
into the PT_LOADs of /proc/vmcore.

Here we introduce a new API walk_pmem_res() to walk the pmem region first. 
Further, we will also
mark pmem region with extra p_flags PF_DEV when it's adding into PT_LOADs.
Then the dumping applications are able to know if the region is pmem or not 
according this flag
and take special actions correspondingly.

Thanks
Zhijian


On 27/04/2023 19:39, Greg Kroah-Hartman wrote:
> On Thu, Apr 27, 2023 at 06:18:34PM +0800, Li Zhijian wrote:
>> It does:
>> 1. Add pmem region into PT_LOADs of vmcore
>> 2. Mark pmem region's p_flags as PF_DEV
> 
> I'm sorry, but I can not parse this changelog.
> 
> Please take a look at the kernel documentation for how to write a good
> changelog message so that we can properly review the change you wish to
> have accepted.
> 
> thanks,
> 
> greg k-h

Re: [RFC PATCH v2 3/3] resource, crash: Make kexec_file_load support pmem

2023-04-28 Thread Zhijian Li (Fujitsu)
Jane,


On 28/04/2023 04:41, Jane Chu wrote:
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index cdd92ab43cda..dc9d03083565 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -178,6 +178,7 @@ static struct crash_mem *fill_up_crash_elf_data(void)
>>   if (!nr_ranges)
>>   return NULL;
>> +    walk_pmem_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
> 
> So this will overwrite 'nr_ranges' produced by the previous 
> walk_system_ram_res() call, sure it's correct?


It depends on how the callback walk_system_ram_res() handle 'nr_ranges', so 
it's safe for this changes IMHO.

163 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
164 {
165 unsigned int *nr_ranges = arg;
166
167 (*nr_ranges)++;
168 return 0;
169 }
170
171 /* Gather all the required information to prepare elf headers for ram 
regions */
172 static struct crash_mem *fill_up_crash_elf_data(void)
173 {
174 unsigned int nr_ranges = 0;
175 struct crash_mem *cmem;
176
177 walk_system_ram_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
178 if (!nr_ranges)
179 return NULL;
180
181 walk_pmem_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);

At last, nr_ranges = #ram_res + #pmem_res.

Thanks
Zhijian

> 
> Regards,
> -jane

Re: [RFC PATCH v2 2/3] drivers/nvdimm: export memmap of namespace to vmcoreinfo

2023-04-28 Thread Zhijian Li (Fujitsu)


On 28/04/2023 06:50, Ira Weiny wrote:
> Li Zhijian wrote:
>> Each namespace has its own memmap, it will be udpated when
>> namespace initializing/creating, updating, and deleting.
>>
>> CC: Dan Williams 
>> CC: Vishal Verma 
>> CC: Dave Jiang 
>> CC: Ira Weiny 
>> Signed-off-by: Li Zhijian 
>> ---
>>   drivers/nvdimm/namespace_devs.c | 2 ++
>>   drivers/nvdimm/pfn_devs.c   | 3 +++
>>   2 files changed, 5 insertions(+)
>>
>> diff --git a/drivers/nvdimm/namespace_devs.c 
>> b/drivers/nvdimm/namespace_devs.c
>> index c60ec0b373c5..096203e6203f 100644
>> --- a/drivers/nvdimm/namespace_devs.c
>> +++ b/drivers/nvdimm/namespace_devs.c
>> @@ -8,6 +8,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   #include "nd-core.h"
>>   #include "pmem.h"
>>   #include "pfn.h"
>> @@ -853,6 +854,7 @@ static ssize_t size_store(struct device *dev,
>>  if (rc == 0 && val == 0 && is_namespace_pmem(dev)) {
>>  struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
>>   
>> +devm_memmap_vmcore_delete(to_ndns(dev));
> 
> This seems like an odd place to put this.  Could you explain the reasoning
> more?
> 


Ira,

Users who want to manage the namespace of pmem usually use the 'ndctl' command. 
The following cases
would touch the memmap of the namespace.

a. create namespace 'ndctl create-namespace -f -e namespace0.0 --mode=fsdax -s 
$(((1024+16)<<20)) -M dev'
b. change namespace size 'ndctl create-namespace -f -e namespace0.0 
--mode=fsdax -s $(((1024)<<20)) -M dev'
c. change memmap location 'ndctl create-namespace -f -e namespace0.0 
--mode=fsdax -s $(((1024+16)<<20)) -M mem'
d. destroy namespace 'ndctl destroy-namespace -f namespace0.0'

Unlike the former 3 cases, the case d, it will not invoke 
'__nvdimm_setup_pfn()'. Instead, ndctl
just do something like 'echo 0 >/sys/bus/nd/devices/namespace0.0/size'

We have to delete this namespace from devm_memmap_vmcore in this case. So here 
is an odd place
but it works. I have tried to find a place pairing with __nvdimm_setup_pfn(), 
but i failed
at last. If you have any good idea, please let me know :)

Thanks
Zhijian

> Ira
> 
>>  kfree(nspm->uuid);
>>  nspm->uuid = NULL;
>>  }
>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>> index af7d9301520c..80076996b2da 100644
>> --- a/drivers/nvdimm/pfn_devs.c
>> +++ b/drivers/nvdimm/pfn_devs.c
>> @@ -9,6 +9,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   #include "nd-core.h"
>>   #include "pfn.h"
>>   #include "nd.h"
>> @@ -716,6 +717,8 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, 
>> struct dev_pagemap *pgmap)
>>  } else
>>  return -ENXIO;
>>   
>> +devm_memmap_vmcore_update(ndns, altmap->base_pfn, PHYS_PFN(offset),
>> +  nd_pfn->mode == PFN_MODE_PMEM);
>>  return 0;
>>   }
>>   
>> -- 
>> 2.29.2
>>
> 
>