On Wed, 2026-06-10 at 22:33 +0530, Aditya Gupta wrote:
> On 20/05/26 14:58, Shivang Upadhyay wrote:
> 
> > When a ppc machine is started with maxcpus greater than the number
> > > of present CPUs (e.g., -smp cpus=2,maxcpus=8), firmware allocates
> > > 
> the CPU state data buffer based on maxcpus and advertises this size >
> to 
> the kernel during fadump registration. > > However, fadump currently 
> fails to generate a vmcore during the > rebooted kernel because: > >
> 1. 
> The CPU state buffer length does not match what firmware > advertised
> (QEMU only populates entries for present CPUs, not > maxcpus) > 2.
> The 
> kernel cannot find CPU data for all maxcpus slots
> Hello Shivang, same thing from our discussion offline, can you see if
> this
> fix belongs in the kernel instead of qemu ?
> 
> PAPR (section H.1) states:
> 
> > Only CPUs that are online at the start of the Firmware Assisted
> > Dump > will have their register data saved.
> QEMU's implementation is aligned with PAPR.
> The PAPR doesn't say anything about the offline CPUs, so that may be
> considered as undefined.
> Let's see if it's an assumption from kernel side, and if it's
> possible to be
> fixed there, else if we will take this patch in qemu.
> 
> Thanks,
> - Aditya G

Hi Aditya,

 I get your point about PAPR, it just mentions about saving online CPUs
and doesnt go in detail about the case of offline/inactive/not-added
CPUs.

But the case of DLPAR'ed cpus is interested. It would be important to
save the state of new CPUs, as per me. 

Anyways regarding the Kernel fix. there is this concerning line in
kernel[1]. 


switch (type) {
                case RTAS_FADUMP_CPU_STATE_DATA:
                case RTAS_FADUMP_HPTE_REGION:
                case RTAS_FADUMP_REAL_MODE_REGION:
                        if (fdm_active->rgn[i].error_flags != 0) {
                                pr_err("Dump taken by platform is not
valid (%d)\n", i);
                                rc = -EINVAL;
                        }
                        if (fdm_active->rgn[i].bytes_dumped !=
fdm_active->rgn[i].source_len) {
                                pr_err("Dump taken by platform is
incomplete (%d)\n", i);
                                rc = -EINVAL;
                        }


Basically it panics if the size advertised by Firmware is not the same
as the size it recieves at the fadump time. in qemu the advertiesed
size comes from here. [2]

    /*
     * CPU State Data contains multiple fields such as header, num_cpus
and
     * register entries
     *
     * Calculate the maximum CPU State Data size, according to maximum
     * possible CPUs the QEMU VM can have
     *
     * This calculation must match the 'cpu_state_len' calculation done
in
     * 'populate_cpu_state_data' in spapr_fadump.c
     */
    fadump_cpu_state_size += sizeof(struct FadumpRegSaveAreaHeader);
    fadump_cpu_state_size += 0xc;                      /* padding as in
PAPR */
    fadump_cpu_state_size += sizeof(uint32_t);         /* num_cpus */
    fadump_cpu_state_size += max_possible_cpus   *     /* reg entries
*/
                             FADUMP_PER_CPU_REG_ENTRIES *
                             sizeof(struct FadumpRegEntry);


So its the same problem here. It too advertizes to kernel that
collection needs to happen on max_possible_cpus. atleast this change
need to be made. But if we make this change, what will we fill the
empty cpu's slot with?

~Shivang.

[1]https://elixir.bootlin.com/linux/v6.15.6/source/arch/powerpc/platforms/pseries/rtas-fadump.c#L458

[2]https://elixir.bootlin.com/qemu/v11.0.1/source/hw/ppc/spapr.c#L941

Reply via email to