Mark,

Since I responded, I had him do a complete trace of the ipl until it started looping. I have already reviewed the trace and found that not long after IPL, there was an invalid instruction.

After asking more questions, I found out that he was doing something I did not know about.

The restored system was being tested not on their current CPU, but the one they have in their DR site. The invalid instruction is valid on their production machine, but is invalid on their DR machine. Z114 vs Z10.

Sometimes I just want to reach though the phone lines and see if I can relocate their head to a lower orifice.

Tony Thigpen

Mark Post wrote on 2/25/22 17:13:
On 2/25/22 16:40, Tony Thigpen wrote:
Mark,

I did not think of tracing. Yes, it is in a tight loop. Here is a bit of
the log:

-> 00000000000039D0   STMG    EB0F02000024 >> 0000000000000200     CC 0
    00000000000039D6   AGHI    A7FBFF38    CC 1
    00000000000039DA   LARL    C0E0FFFFF9FB    CC 1
    00000000000039E0   BASR    0DEE     -> 0000000000002DD0     CC 1
-> 0000000000002DD0   STMG    EBDFF0880024 >> FFFFFFFFFB274FB8     CC 1
** 0000000000002DD0       PROG    0005 -> 00000000000039D0
              ADDRESSING
-> 00000000000039D0   STMG    EB0F02000024 >> 0000000000000200     CC 0
    00000000000039D6   AGHI    A7FBFF38    CC 1
    00000000000039DA   LARL    C0E0FFFFF9FB    CC 1
    00000000000039E0   BASR    0DEE     -> 0000000000002DD0     CC 1
-> 0000000000002DD0   STMG    EBDFF0880024 >> FFFFFFFFFB274EF0     CC 1
** 0000000000002DD0       PROG    0005 -> 00000000000039D0
              ADDRESSING
-> 00000000000039D0   STMG    EB0F02000024 >> 0000000000000200     CC 0
    00000000000039D6   AGHI    A7FBFF38    CC 1
    00000000000039DA   LARL    C0E0FFFFF9FB    CC 1
    00000000000039E0   BASR    0DEE     -> 0000000000002DD0     CC 1
-> 0000000000002DD0   STMG    EBDFF0880024 >> FFFFFFFFFB274E28     CC 1
** 0000000000002DD0       PROG    0005 -> 00000000000039D0
              ADDRESSING

Yeah, that's going to be a pain to debug. The System.map file in /boot would indicate what routine that's in, but after that it's most likely going to require a kernel expert to figure out. You can tell that the STMG instruction is trying to address what appears to be a negative (or extremely large) address and getting an addressing exception. But how it got there is a much bigger problem to figure out.

What version of SLES is this? What model machine is this running on? I'm assuming it's the same machine as before, and just the DASD was swapped out, but you never know.


Mark Post

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www2.marist.edu/htbin/wlvindex?LINUX-390

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www2.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to