Eero,

Am 02.05.2019 um 09:34 schrieb Eero Tamminen:
Hi,

Before going for vacation I had changed Hatari call stack handling
to deal with kernel stack manipulations (changing rts to a jump)
by brute force. Call stack output has now also symbol offset info.

I used these to look at the random LINE issue with udevadm on
Debian installer initrd.  Backtrace of that looks weird though:
---------------------------------------------------
...
[   54.880000] random: crng init done
[   61.660000] *** LINE 1111 ***   FORMAT=0
[   61.670000] Current process id is 50

1. CPU breakpoint condition(s) matched 1 times.
    pc = die_if_kernel :trace :noinit :file stack-show.ini

Reading debugger commands from 'stack-show.ini'...
profile stack
- 1. 0x00511a: trap_c -0x56 (return = 0x287c)
- 2. 0x002876: schedule -0x2c7b8e (return = 0x3ba5e)
  => trap() instruction calling trap_c()
- 3. 0x03ba58: schedule -0x28e9ac (return = 0x3ba5e)
  => worker_thread() instruction calling schedule()
- 4. 0x03ba58: system_call +0x3913c (return = 0xc00e5082)
  => worker_thread() instruction calling schedule()
- 5. 0xc00e507e: system_call (return = 0xc00418da)
  => do_fcntl()?
- 6. 0xc00418d6: system_call (return = 0xc00418da)
  => middle of current_is_async() RTS instruction -> non-translated
user-space address from TT-RAM?
- 7. 0xc00418d6: system_call (return = 0xc00418da)
...
- 250. 0xc00418d6: system_call (return = 0xc00dbf8a)
- 251. 0xc00dbf86: system_call (return = 0xc00dcd26)
- 252. 0xc00dcd22: system_call (return = 0xc00418da)

[   61.680000] BAD KERNEL TRAP: 00000000
[   61.700000] Modules linked in:
[   61.710000] PC: [<00002a2c>] user_inthandler+0x4/0x20
[   61.720000] SR: 2c08  SP: bdb94e1a  a2: 0006e000
[   61.730000] d0: c0274000    d1: 000083ac    d2: 00000af9    d3: c021b81c
[   61.740000] d4: c0206000    d5: c001ed40    a0: c0206000    a1: c021d2c8
[   61.750000] Process udevadm (pid: 50, task=d01ed1a6)
[   61.760000] Frame format=0
[   61.770000] Stack from 05a4fff8:
[   61.770000]         0208c000 b6460110
[   61.780000] Call Trace:
[   61.790000] Code: 0000 2b24 508f 60ff ffff ff64 42a7 4878 <ffff> 2f00
48e7 7ce0 200f 0280 ffff e000 2440 2452 e9ef 010a 0032 0440 0038 2f0f
[   61.810000] Disabling lock debugging due to kernel taint
---------------------------------------------------

I get large number of system_call() calls in call stack only
when tracking exception calls in addition to subroutine calls
(this happens also when there are no oopses & bad traps).

This means that there are some system_call() traps, where matching
RTE instructions are missing.

Looking at the assembly for system_call(), after the system call
has finished, system_call() can end in RTE, but in some cases it
will (eventually) call schedule() instead.

Correct. The stack pointer in the above dump points to a user stack which supports your notion that a syscall or interrupt return is causing the fault.


And apparently during last schedule() operation in above call stack,
user_inthandler() gets called at 0x4 offset where there's no valid
opcode (as that address is between instructions):
---------------------------------------------------
d user_inthandler
user_inthandler:
$00002a28 : 42a7       clr.l     -(sp)
$00002a2a : 4878 ffff  pea       $ffffffff.w
$00002a2e : 2f00       move.l    d0,-(sp)
...
m $2a2c
00002A2C: ff ff 2f 00 ...
---------------------------------------------------

I assume the invalid "ffff" "opcode" to cause the "line 1111" trap.

Right. And the fact that this address was used as a jump or rts target means something has smashed your stack. I don't see how this address could have got on the stack in any other way.

Since the stack has been corrupted, you can't probably trust the stack contents at all in regards to figuring out the sequence of function calls that got you there.

Even if the stack was intact, at the time of rte or schedule the stack has been restored to what it was before the syscall or interrupt happened. Anything on the stack at that point can't reflect kernel call history.

Not sure how to get information about the kernel call history - you might have to save the stack page upon returning from the syscall, before ret_from_syscall restores saved registers from the stack and you take the trap.


Because of the PS_S status register state, kernel trap handler says
it's a bad trap, see [1].


Questions:

* Any idea what could cause wrong offset to user_inthandler()?

* Does schedule(), or rest of system_call(), clean exception stack?

No, that would be impossible to get portable. It's all handled on the system call return path in entry.S, either through ret_from_syscall or resume_userspace.


* What kind of size limits there are on different stack sizes?

* Are there e.g. any symbols pointing to stack base addresses
  and stack size limits which I could use in conditional breakpoints?

The kernel task stack pointer is initialized to 8k past the start of the kernel data segment (see head.S). So your stack limit is 8k - beyond that, you're going to run into the rodata segment.

The user stack is set to just below TASK_SIZE initially (fs/exec.c), but the address sp points to looks like it's later changed to just below TASK_UNMAPPED_BASE. Geert or Andreas may know more details.

Cheers,

        Michael


    - Eero

[1] Looking at kernel code, above call stack comes from:
  -> trap_c()
     -> void bad_super_trap()
        -> die_if_kernel()

Yes, but isn't that just the call sequence to report the bogus line-f trap, caused by a corrupted return address on the stack?


Corresponding kernel code is:
---------------------------------------------------
ENTRY(trap)
        SAVE_ALL_INT
        GET_CURRENT(%d0)
        movel   %sp,%sp@-               | stack frame pointer argument
        jbsr    trap_c
...
asmlinkage void trap_c(struct frame *fp)
{
        int sig, si_code;
        void __user *addr;
        int vector = (fp->ptregs.vector >> 2) & 0xff;

        if (fp->ptregs.sr & PS_S) {
                if (vector == VEC_TRACE) {
                        /* traced a trapping instruction on a 68020/30,
                         * real exception will be executed afterwards.
                         */
                        return;
                }
#ifdef CONFIG_MMU
                if (fixup_exception(&fp->ptregs))
                        return;
#endif
                bad_super_trap(fp);
                return;
        }
...
void bad_super_trap (struct frame *fp)
{
        int vector = (fp->ptregs.vector >> 2) & 0xff;

        console_verbose();
        if (vector < ARRAY_SIZE(vec_names))
                pr_err("*** %s ***   FORMAT=%X\n",
                        vec_names[vector],
                        fp->ptregs.format);
        else
                pr_err("*** Exception %d ***   FORMAT=%X\n",
                        vector, fp->ptregs.format);
        if (vector == VEC_ADDRERR && CPU_IS_020_OR_030) {
...
        }
        pr_err("Current process id is %d\n", task_pid_nr(current));
        die_if_kernel("BAD KERNEL TRAP", &fp->ptregs, 0);
}
...
void die_if_kernel (char *str, struct pt_regs *fp, int nr)
{
        if (!(fp->sr & PS_S))
                return;

        console_verbose();
        pr_crit("%s: %08x\n", str, nr);
        show_registers(fp);
        add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
        do_exit(SIGSEGV);
}
...
int fixup_exception(struct pt_regs *regs)
{
        const struct exception_table_entry *fixup;
        struct pt_regs *tregs;

        /* Are we prepared to handle this kernel fault? */
        fixup = search_exception_tables(regs->pc);
        if (!fixup)
                return 0;
...
const struct exception_table_entry *search_exception_tables(unsigned
long addr)
{
        const struct exception_table_entry *e;

        e = search_extable(__start___ex_table,
                           __stop___ex_table - __start___ex_table, addr);
        if (!e)
                e = search_module_extables(addr);
        return e;
}
...
const struct exception_table_entry *search_module_extables(unsigned long
addr)
{
        const struct exception_table_entry *e = NULL;
        struct module *mod;

        preempt_disable();
        mod = __module_address(addr);
        if (!mod)
                goto out;

        if (!mod->num_exentries)
                goto out;

        e = search_extable(mod->extable,
                           mod->num_exentries,
                           addr);
out:
        preempt_enable();

        /*
         * Now, if we found one, we are running inside it now, hence
         * we cannot unload the module, hence no refcnt needed.
         */
        return e;
}
...
const struct exception_table_entry *
search_extable(const struct exception_table_entry *base,
               const size_t num,
               unsigned long value)
{
        return bsearch(&value, base, num,
                       sizeof(struct exception_table_entry),
cmp_ex_search);
}
...
#define EXCEPTION_TABLE(align)                                          \
        . = ALIGN(align);                                               \
        __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {               \
                __start___ex_table = .;                                 \
                KEEP(*(__ex_table))                                     \
                __stop___ex_table = .;                                  \
        }
...
SECTIONS
{
  . = 0x1000;
  _text = .;                    /* Text and read-only data */
  .text : {
        HEAD_TEXT
        TEXT_TEXT
        IRQENTRY_TEXT
        SOFTIRQENTRY_TEXT
        SCHED_TEXT
        CPUIDLE_TEXT
        LOCK_TEXT
        *(.fixup)
        *(.gnu.warning)
        } :text = 0x4e75

  _etext = .;                   /* End of text section */

  EXCEPTION_TABLE(16)

  _sdata = .;                   /* Start of data section */

  RODATA

Reply via email to