Hello,

I have spent the last two weeks debugging kexec and kdump on the cell 
platform.   I got my kernels to boot on the Cell system and the Cell 
simulator, but I am baffled as the changes that I made are in the common 
powerpc code.   If I am right, it doesn't work on powerpc.   I must have 
gone astray... 

First, I am loading my kernel using 

kexec -l vmlinux        or      kexec -l vmlinux --append="maxcpus=0"

This results in a bad start pointer.  Specifically,

image->start = image->memory[1].mem

The entire kernel is loaded into image->memory[0].mem and I don't know 
what is supposed to be in second and third memory segments, perhaps glue 
code, but whatever it is it doesn't work.   The system hangs executing 
code in the second region. 

I changed the kernel, so that 

image->start = image->memory[0] + KERNELBASE. 

This enables the kernel start routine to be properly invoked, when I enter 
the command kexec -e.

The next problem is that the second kernel expects the device tree, ie. 
initial_boot_params, to be passed as an input parameter to it.   However, 
the first kernel passes the physical cpu id instead which is incorrectly 
assigned to the initial_boot_param in the second kernel resulting in a 
system crash in early_init_devtree().   To fix this problem, I loaded the 
initial_boot_param in the first kernel and passed it to the second one. 
This meant that I needed to change the kexec style kernel entry point as I 
needed to pass both the initial_boot_param and physical cpu id. 

You could conceivable do without the physical cpu id in the second kernel, 
but I think it is needed in the second kernel for a variety of reasons. 
First, the secondary cpus have a dependency on the physical cpu id in the 
kexec_wait() / secondary_hold() code paths [r3] and I think it is a bad 
practice for two cpus to think even for a little while that they are the 
same physical cpu.  Second, I don't think that you want to change paca 
assignments between the two kernels.   If paca[0] is associated physical 
cpu X in the first kernel, it should be in the second kernel as well. 
Finally, and this is the most important point, the physical cpu id must be 
passed to the second kernel, so that the 
initial_boot_params.boot_cpuid_phys can be updated to reflect the identity 
of the code running kexec, so that the same code paths are taken each time 
that a kexec style boot takes place.   There is physical cpu specific 
logic in the kernel initialization code.   see early_init_dt_scan_cpus().

So in the kdump kernel:

        initial_boot_params->boot_cpuid_phys = phys_cpu_id;

and this enabled the kernel to boot.

In retrospect, I could probably make this assignment under the kexec 
kernel (first kernel) and not pass it and not change the interface.   Ah 
hah.

I can send patches, but I must have gotten some of this wrong, or more 
people would be complaining about kexec.

regards,
Luke
_______________________________________________
fastboot mailing list
[email protected]
https://lists.osdl.org/mailman/listinfo/fastboot

Reply via email to