Re: Maple PPC970 kexec crash-dump problems

2009-02-04 Thread Benjamin Walsh
Hi Milton,

I've tracked it down to the device tree passed to the second kernel being
screwed-up when patched by kexec-tools. Namely, it was creating
linux,usable-memory entries that were wrong, and the MMU initialization hung
when it failed allocating for the page tables. I hacked the tool, and got
passed that point in the init sequence, but the very first IO mapped access
fails, so the MMU doesn't seem to be set up correctly.

Anyway, up to my question: is the crash dump (kdump) kernel supposed to use
the memory reserved for it by the first kernel for its working memory ? e.g.
On that board, I have 0-2GB and 4-6GB for a total of 4GB of RAM. Let's say
I reserve 1...@32m, that's 0x200-0xa00. Is the second kernel
supposed to use

(0x200+kernel size) - 0xa00

for its memory pool and leave everything else:

0-0x200, 0xa00 - 8000, 0x1 - 0x18000

as memory that is from the first kernel, used to debug it ?

Basically, I am trying to figure out if I patched the tool correctly.

Thanks,
Ben

On Sat, Jan 24, 2009 at 2:52 AM, Milton Miller milt...@bga.com wrote:

 On Sat Jan 24 at 07:59:47 EST in 2009, Benjamin Walsh wrote:

 I am trying to use kexec with a crash dump kernel on a Maple board
 (Motorola
 ATCA6101 to be precise). This board is running a two-CPU PPC970FX. I am
 running a 2.6.27-10 kernel and have tried both older kexec-tools and the
 newest ones. I have tried SMP and non-SMP kernels.


 Once you start the second cpu it is likly executing instructions somewhere.

 Priory to 2.6.27 you had to compile a fixxed offset kerenl to run kdump.
  With 2.6.27 that option was removed and replaced with teh relocatable
 kerenl.  However, becasue of the way linux interacts with open firmware, the
 kernel will still move itself to 0 unless a specific flag is set.   The
 location of the flag was changed twice during the merge process, and the
 patches for kexec-tools were not made until early this year.

  Using kexec -l to fast boot works correctly. However, loading a crash dump
 kernel and triggering a crash via echo c  /proc/sysrq-trigger simply
 hangs
 the board. I have traced the sequence down to after the call to
 kexec_copy_flush(), when the CPU returns to real-address mode (bl
 real_mode). At this point I have no further debugging information.



  Two things could help me:

 - Getting the fix if this is a known issue and a fix exists. I have looked
 at recent patches and nothing lept to mind, mostly relocatable kernel
 support.


 That is a major change.

 That said, I don't know if anyone has tested kexec panic beyond pseries for
 64 bit powerpc.

 I know Paul originally prototyped the relocatable patch on a powermac, but
 I dont' know what if any smp testing he performed.   And you said you are
 actualy on maple not a powermac, so the startup issues are different.

  - Obtaining the address of the serial port @3f8 in real mode. The init
 sequence with udbg ON says that the physical address of the port is
 0xf40003f8; however, setting it up in poll mode and trying to stuff
 characters in the tx buffer doesn't produce anything.


 Ah yes.  In real mode you can only talk to cacheable memory without
 implementation specific assistance.  However, if you look in the kernel for
 the maple early udbg support, you will find the code you need to talk to
 that serial port in real mode.


 Has anyone recently tried to use the serial port in real mode ?

 Thanks for any help.

 Ben


 Hope this gets you started.  I wrote a lot of the kernel code, but I had
 the advantage of external jtag access to the processor to see where it when
 ended up when it went astray.

 milton


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Maple PPC970 kexec crash-dump problems

2009-01-23 Thread Benjamin Walsh
Hi all,

I am trying to use kexec with a crash dump kernel on a Maple board (Motorola
ATCA6101 to be precise). This board is running a two-CPU PPC970FX. I am
running a 2.6.27-10 kernel and have tried both older kexec-tools and the
newest ones. I have tried SMP and non-SMP kernels.

Using kexec -l to fast boot works correctly. However, loading a crash dump
kernel and triggering a crash via echo c  /proc/sysrq-trigger simply hangs
the board. I have traced the sequence down to after the call to
kexec_copy_flush(), when the CPU returns to real-address mode (bl
real_mode). At this point I have no further debugging information.

Two things could help me:

- Getting the fix if this is a known issue and a fix exists. I have looked
at recent patches and nothing lept to mind, mostly relocatable kernel
support.
- Obtaining the address of the serial port @3f8 in real mode. The init
sequence with udbg ON says that the physical address of the port is
0xf40003f8; however, setting it up in poll mode and trying to stuff
characters in the tx buffer doesn't produce anything.

Has anyone recently tried to use the serial port in real mode ?

Thanks for any help.

Ben
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Repost from linuxppc-embedded: NMI and AMD8131/8111 on Maple board

2008-11-28 Thread Benjamin Walsh
Hi all,

I am reposting this from linuxppc-embedded, as I might get more answers here.
Anything that can move this case forward is greatly appreciated. Thanks.

Here is the current thread:

Benjamin Walsh wrote:
  Hi all,
 
  I've written EDAC support for the AMD8131/8111 chips that are present on a
  Maple board (PPC970FX with IBM CPC925 memory controller/bridge), currently
  running in poll mode. I am now trying to get this to work in interrupt mode.
  These two chipsets have a feature that enables triggering an NMI when an
  error is detected (PERR and SERR). How can this be hooked into the interrupt
  system on a PPC board ?
 
  From what I understand from the doc for these chipsets, the NMI will
  delivered as a HT message to the CPC925 on this board. What I don't get is
  how will this be delivered to the CPU, and on what interrupt line ? The HT
  message sent to the CPC925 is the following:
 
  MT = NMI
  TM = edge
  DM = physical
  INTRDEST = 'hFF (all)
  VECTOR = 'h00 (does not matter)
 

 The AMD8131/8130 can generate an NMI to the CPC925. There is an interrupt
 controller resided in the CPC925. And you know the CPC925 is attached the
 PowerPC PPC970FX. The interrupt controller collects and distributes system
 interrupts from the PCI Express and HyperTransport blocks. So you should get 
 the
 map connection based on the system in detail. Often these information should 
 be
 defined in the corresponding dtc.

 Best Regards
 Tiejun

The only DTS I have is the one I extracted from a running target. This
is part of the entry for the 8111:

[EMAIL PROTECTED] {
ranges = 0x8100 0x0 0x0 0x0 0xf400 0x0
0x40 0x8200 0x0 0x8000 0x0 0x8000 0x0 0x7000;
reg = 0x0 0xf200 0x300;
device_type = ht;
bus-range = 0x0 0x5;
compatible = u3-ht;
interrupt-map-mask = 0xf800 0x0 0x0 0x7;
interrupt-map = 
0x0900 0x0 0x0 0x0 0x6103fa00 0x00 0x1
0x1100 0x0 0x0 0x0 0x6103fa00 0x00 0x1
0x1900 0x0 0x0 0x0 0x6103fa00 0x00 0x1
0x2100 0x0 0x0 0x0 0x6103fa00 0x00 0x1
0x3000 0x0 0x0 0x0 0x6103fa00 0x00 0x1
0x3200 0x0 0x0 0x4 0x6103fa00 0x19 0x1
0x3300 0x0 0x0 0x0 0x6103fa00 0x00 0x1
0x3400 0x0 0x0 0x3 0x6103fa00 0xff 0x1
0x3500 0x0 0x0 0x2 0x6103fa00 0x17 0x1
0x3600 0x0 0x0 0x2 0x6103fa00 0x17 0x1
0x3700 0x0 0x0 0x0 0x6103fa00 0x00 0x1;
#address-cells = 0x3;
linux,phandle = 0x61043600;
name = ht;
#interrupt-cells = 0x1;
#size-cells = 0x2;

And this is the CPC925, with its interrupt controller:

[EMAIL PROTECTED] {
reg = 0xf800 0x100;
device_type = memory-controller;
compatible = u3;
#address-cells = 0x1;
linux,phandle = 0x61044000;
name = hostbridge;
#size-cells = 0x1;

[EMAIL PROTECTED] {
reg = 0xf8033000 0x7000;
device_type = dart;
compatible = u3-dart, dart;
linux,phandle = 0x61045e00;
name = dart;
};

[EMAIL PROTECTED] {
reg = 0xf804 0x4;
device_type = open-pic;
interrupt-controller;
compatible = open-pic;
big-endian;
built-in;
#address-cells = 0x0;
linux,phandle = 0x6103fa00;
name = interrupt-controller;
clock-frequency = 0x0;
#interrupt-cells = 0x2;
};

And I think this is the part of the LPC bridge entry on the 8111:

[EMAIL PROTECTED] {
min-grant = 0x0;
ranges = 0x1 0x0 0x1003000 0x0 0x0 0x1;
reg = 0x3000 0x0 0x0 0x0 0x0;
device_type = isa;
revision-id = 0x5;
66mhz-capable;
max-latency = 0x0;
class-code = 0x60100;
vendor-id = 0x1022;
linux,phandle = 0x610dfa00;
name = isa;
device-id = 0x7468;

The LPC bridge is supposed to be able to generate an NMI on error. Am
I right in saying that the 0x3000 entry in the [EMAIL PROTECTED] interrupt-map
corresponds to the LPC bridge ? If so, the mapping I can read from
there is 0-0. 0 is an internal interrupt of the CPC925