On Fri, 24 Jan 2025, Ivan Kokshaysky wrote:
> > > > Indeed, SP_OFF in entry.S is the main suspect at the moment.
> > >
> > > In fact, it's the odd number of longs (29) in struct pt_regs that makes
> > > the stack misaligned by 8 bytes. The patch below works for me - no more
> > > oopses in rcu-torture test.
> > >
> > > Unless I'm missing something, this change shouldn't have any ill effects.
> >
> > Umm, this is a part of UAPI, and the change in alignment changes the ABI
> > (think padding where `struct pt_regs' has been embedded into another
> > structure), so AFAICT it is a no-no.
>
> Well, the only userspace applications I can think of that need kernel
> stack layout are debuggers, but at least alpha gdb doesn't use this header.
> Doesn't matter, though - padding *after* PAL-saved registers is wrong
> thing to do. I think it's the reason for oopses that Magnus reported
> today.
>
> A "long" padding memder of pt_regs placed *before* PAL-saved registers
> would be a proper fix for kernel, but it most likely would break gdb...
>
> > But the only place I could quickly find this should matter for is this:
> >
> > /* ... and find our stack ... */
> > lda $30,0x4000 - SIZEOF_PT_REGS($8)
> >
> > which should be straightforward to fix:
> >
> > lda $30,0x4000 - ((SIZEOF_PT_REGS + 15) & ~15)($8)
> >
> > or suchlike. Have I missed anything?
>
> That's the first thing I thought of too, but no, it's just a kernel
> entry point after the bootloader. The stack pointer of kernel threads
> is assigned in alpha/kernel/process.c. Particularly, these macros
> in ptrace.h (non-uapi) are interesting:
>
> #define task_pt_regs(task) \
> ((struct pt_regs *) (task_stack_page(task) + 2*PAGE_SIZE) - 1)
>
> #define current_pt_regs() \
> ((struct pt_regs *) ((char *)current_thread_info() + 2*PAGE_SIZE) - 1)
>
> I'll try to play with alignment here, but it will take some time.
So after a crash course in PALcode stack frames I have come up with the
following WIP patch that works for me. If things go well, I'll clean it
up a little and turn into a proper patch submission. Not that I think I
can make the end result particularly pretty, there's no easy way AFAICT.
NB with some instrumentation here's what gets reported for stack without:
start_kernel: SP: fffffc0000dcfe98
do_entInt: SP: fffffc0000dcfc30
copy_thread: SP: fffffc0000dcfc98, regs: fffffc0000dcff18, childregs:
fffffc0001837f18, childstack: fffffc0001837ed8
do_page_fault: SP: fffffc0001837bc8
sys_exit_group: SP: fffffc0002917ef8
do_entUnaUser: SP: fffffc0001f33e70
do_entArith: SP: fffffc0001f33ee8
do_entIF: SP: fffffc000184bee8
and with the patch:
start_kernel: SP: fffffc0000dcfe90
do_entInt: SP: fffffc0000dcfc20
copy_thread: SP: fffffc0000dcfc90, regs: fffffc0000dcff18, childregs:
fffffc000183bf18, childstack: fffffc000183bed0
do_page_fault: SP: fffffc000183bbc0
sys_exit_group: SP: fffffc00028d3ef0
do_entUnaUser: SP: fffffc000292fe70
do_entArith: SP: fffffc0001d7fee0
do_entIF: SP: fffffc0002827ee0
for the relevant situations (except for `entDbg', but that's analogous and
largely unused anyway).
Can you guys please give it a try?
Maciej
---
arch/alpha/kernel/entry.S | 262 +++++++++++++++++++++++++-------------------
arch/alpha/kernel/head.S | 4
arch/alpha/kernel/process.c | 12 +-
3 files changed, 160 insertions(+), 118 deletions(-)
Index: linux-melmac/arch/alpha/kernel/entry.S
===================================================================
--- linux-melmac.orig/arch/alpha/kernel/entry.S
+++ linux-melmac/arch/alpha/kernel/entry.S
@@ -16,7 +16,7 @@
.cfi_sections .debug_frame
/* Stack offsets. */
-#define SP_OFF 184
+#define SP_OFF 192
#define SWITCH_STACK_SIZE 64
.macro CFI_START_OSF_FRAME func
@@ -52,80 +52,80 @@
.macro SAVE_ALL
subq $sp, SP_OFF, $sp
.cfi_adjust_cfa_offset SP_OFF
- stq $0, 0($sp)
- stq $1, 8($sp)
- stq $2, 16($sp)
- stq $3, 24($sp)
- stq $4, 32($sp)
- stq $28, 144($sp)
- .cfi_rel_offset $0, 0
- .cfi_rel_offset $1, 8
- .cfi_rel_offset $2, 16
- .cfi_rel_offset $3, 24
- .cfi_rel_offset $4, 32
- .cfi_rel_offset $28, 144
+ stq $0, 8($sp)
+ stq $1, 16($sp)
+ stq $2, 24($sp)
+ stq $3, 32($sp)
+ stq $4, 40($sp)
+ stq $28, 152($sp)
+ .cfi_rel_offset $0, 8
+ .cfi_rel_offset $1, 16
+ .cfi_rel_offset $2, 24
+ .cfi_rel_offset $3, 32
+ .cfi_rel_offset $4, 40
+ .cfi_rel_offset $28, 152
lda $2, alpha_mv
- stq $5, 40($sp)
- stq $6, 48($sp)
- stq $7, 56($sp)
- stq $8, 64($sp)
- stq $19, 72($sp)
- stq $20, 80($sp)
- stq $21, 88($sp)
+ stq $5, 48($sp)
+ stq $6, 56($sp)
+ stq $7, 64($sp)
+ stq $8, 72($sp)
+ stq $19, 80($sp)
+ stq $20, 88($sp)
+ stq $21, 96($sp)
ldq $2, HAE_CACHE($2)
- stq $22, 96($sp)
- stq $23, 104($sp)
- stq $24, 112($sp)
- stq $25, 120($sp)
- stq $26, 128($sp)
- stq $27, 136($sp)
- stq $2, 152($sp)
- stq $16, 160($sp)
- stq $17, 168($sp)
- stq $18, 176($sp)
- .cfi_rel_offset $5, 40
- .cfi_rel_offset $6, 48
- .cfi_rel_offset $7, 56
- .cfi_rel_offset $8, 64
- .cfi_rel_offset $19, 72
- .cfi_rel_offset $20, 80
- .cfi_rel_offset $21, 88
- .cfi_rel_offset $22, 96
- .cfi_rel_offset $23, 104
- .cfi_rel_offset $24, 112
- .cfi_rel_offset $25, 120
- .cfi_rel_offset $26, 128
- .cfi_rel_offset $27, 136
+ stq $22, 104($sp)
+ stq $23, 112($sp)
+ stq $24, 120($sp)
+ stq $25, 128($sp)
+ stq $26, 136($sp)
+ stq $27, 144($sp)
+ stq $2, 160($sp)
+ stq $16, 168($sp)
+ stq $17, 176($sp)
+ stq $18, 184($sp)
+ .cfi_rel_offset $5, 48
+ .cfi_rel_offset $6, 56
+ .cfi_rel_offset $7, 64
+ .cfi_rel_offset $8, 72
+ .cfi_rel_offset $19, 80
+ .cfi_rel_offset $20, 88
+ .cfi_rel_offset $21, 96
+ .cfi_rel_offset $22, 104
+ .cfi_rel_offset $23, 112
+ .cfi_rel_offset $24, 120
+ .cfi_rel_offset $25, 128
+ .cfi_rel_offset $26, 136
+ .cfi_rel_offset $27, 144
.endm
.macro RESTORE_ALL
lda $19, alpha_mv
- ldq $0, 0($sp)
- ldq $1, 8($sp)
- ldq $2, 16($sp)
- ldq $3, 24($sp)
- ldq $21, 152($sp)
+ ldq $0, 8($sp)
+ ldq $1, 16($sp)
+ ldq $2, 24($sp)
+ ldq $3, 32($sp)
+ ldq $21, 160($sp)
ldq $20, HAE_CACHE($19)
- ldq $4, 32($sp)
- ldq $5, 40($sp)
- ldq $6, 48($sp)
- ldq $7, 56($sp)
+ ldq $4, 40($sp)
+ ldq $5, 48($sp)
+ ldq $6, 56($sp)
+ ldq $7, 64($sp)
subq $20, $21, $20
- ldq $8, 64($sp)
+ ldq $8, 72($sp)
beq $20, 99f
ldq $20, HAE_REG($19)
stq $21, HAE_CACHE($19)
stq $21, 0($20)
-99: ldq $19, 72($sp)
- ldq $20, 80($sp)
- ldq $21, 88($sp)
- ldq $22, 96($sp)
- ldq $23, 104($sp)
- ldq $24, 112($sp)
- ldq $25, 120($sp)
- ldq $26, 128($sp)
- ldq $27, 136($sp)
- ldq $28, 144($sp)
+99: ldq $19, 80($sp)
+ ldq $20, 88($sp)
+ ldq $21, 96($sp)
+ ldq $22, 104($sp)
+ ldq $23, 112($sp)
+ ldq $24, 120($sp)
+ ldq $25, 128($sp)
+ ldq $26, 136($sp)
+ ldq $27, 144($sp)
+ ldq $28, 152($sp)
addq $sp, SP_OFF, $sp
.cfi_restore $0
.cfi_restore $1
@@ -152,6 +152,26 @@
.macro DO_SWITCH_STACK
bsr $1, do_switch_stack
.cfi_adjust_cfa_offset SWITCH_STACK_SIZE
+ .cfi_rel_offset $9, 8
+ .cfi_rel_offset $10, 16
+ .cfi_rel_offset $11, 24
+ .cfi_rel_offset $12, 32
+ .cfi_rel_offset $13, 40
+ .cfi_rel_offset $14, 48
+ .cfi_rel_offset $15, 56
+.endm
+
+.macro DO_SWITCH_STACK_0
+ lda $sp, -SWITCH_STACK_SIZE($sp)
+ .cfi_adjust_cfa_offset -SWITCH_STACK_SIZE
+ stq $9, 0($sp)
+ stq $10, 8($sp)
+ stq $11, 16($sp)
+ stq $12, 24($sp)
+ stq $13, 32($sp)
+ stq $14, 40($sp)
+ stq $15, 48($sp)
+ stq $26, 56($sp)
.cfi_rel_offset $9, 0
.cfi_rel_offset $10, 8
.cfi_rel_offset $11, 16
@@ -173,6 +193,26 @@
.cfi_adjust_cfa_offset -SWITCH_STACK_SIZE
.endm
+.macro UNDO_SWITCH_STACK_0
+ ldq $9, 0($sp)
+ ldq $10, 8($sp)
+ ldq $11, 16($sp)
+ ldq $12, 24($sp)
+ ldq $13, 32($sp)
+ ldq $14, 40($sp)
+ ldq $15, 48($sp)
+ ldq $26, 56($sp)
+ .cfi_restore $9
+ .cfi_restore $10
+ .cfi_restore $11
+ .cfi_restore $12
+ .cfi_restore $13
+ .cfi_restore $14
+ .cfi_restore $15
+ lda $sp, SWITCH_STACK_SIZE($sp)
+ .cfi_adjust_cfa_offset SWITCH_STACK_SIZE
+.endm
+
/*
* Non-syscall kernel entry points.
*/
@@ -182,7 +222,7 @@ CFI_START_OSF_FRAME entInt
lda $8, 0x3fff
lda $26, ret_from_sys_call
bic $sp, $8, $8
- mov $sp, $19
+ addq $sp, 8, $19
jsr $31, do_entInt
CFI_END_OSF_FRAME entInt
@@ -191,15 +231,15 @@ CFI_START_OSF_FRAME entArith
lda $8, 0x3fff
lda $26, ret_from_sys_call
bic $sp, $8, $8
- mov $sp, $18
+ addq $sp, 8, $18
jsr $31, do_entArith
CFI_END_OSF_FRAME entArith
CFI_START_OSF_FRAME entMM
SAVE_ALL
/* save $9 - $15 so the inline exception code can manipulate them. */
- subq $sp, 56, $sp
- .cfi_adjust_cfa_offset 56
+ subq $sp, 48, $sp
+ .cfi_adjust_cfa_offset 48
stq $9, 0($sp)
stq $10, 8($sp)
stq $11, 16($sp)
@@ -227,7 +267,7 @@ CFI_START_OSF_FRAME entMM
ldq $13, 32($sp)
ldq $14, 40($sp)
ldq $15, 48($sp)
- addq $sp, 56, $sp
+ addq $sp, 48, $sp
.cfi_restore $9
.cfi_restore $10
.cfi_restore $11
@@ -235,7 +275,7 @@ CFI_START_OSF_FRAME entMM
.cfi_restore $13
.cfi_restore $14
.cfi_restore $15
- .cfi_adjust_cfa_offset -56
+ .cfi_adjust_cfa_offset -48
/* finish up the syscall as normal. */
br ret_from_sys_call
CFI_END_OSF_FRAME entMM
@@ -245,7 +285,7 @@ CFI_START_OSF_FRAME entIF
lda $8, 0x3fff
lda $26, ret_from_sys_call
bic $sp, $8, $8
- mov $sp, $17
+ addq $sp, 8, $17
jsr $31, do_entIF
CFI_END_OSF_FRAME entIF
@@ -382,8 +422,8 @@ CFI_START_OSF_FRAME entUna
.cfi_restore $0
.cfi_adjust_cfa_offset -256
SAVE_ALL /* setup normal kernel stack */
- lda $sp, -56($sp)
- .cfi_adjust_cfa_offset 56
+ lda $sp, -48($sp)
+ .cfi_adjust_cfa_offset 48
stq $9, 0($sp)
stq $10, 8($sp)
stq $11, 16($sp)
@@ -409,7 +449,7 @@ CFI_START_OSF_FRAME entUna
ldq $13, 32($sp)
ldq $14, 40($sp)
ldq $15, 48($sp)
- lda $sp, 56($sp)
+ lda $sp, 48($sp)
.cfi_restore $9
.cfi_restore $10
.cfi_restore $11
@@ -417,7 +457,7 @@ CFI_START_OSF_FRAME entUna
.cfi_restore $13
.cfi_restore $14
.cfi_restore $15
- .cfi_adjust_cfa_offset -56
+ .cfi_adjust_cfa_offset -48
br ret_from_sys_call
CFI_END_OSF_FRAME entUna
@@ -426,7 +466,7 @@ CFI_START_OSF_FRAME entDbg
lda $8, 0x3fff
lda $26, ret_from_sys_call
bic $sp, $8, $8
- mov $sp, $16
+ addq $sp, 8, $16
jsr $31, do_entDbg
CFI_END_OSF_FRAME entDbg
@@ -478,8 +518,8 @@ CFI_END_OSF_FRAME entDbg
ldgp $gp, 0($26)
blt $0, $syscall_error /* the call failed */
$ret_success:
- stq $0, 0($sp)
- stq $31, 72($sp) /* a3=0 => no error */
+ stq $0, 8($sp)
+ stq $31, 80($sp) /* a3=0 => no error */
.align 4
.globl ret_from_sys_call
@@ -520,15 +560,15 @@ CFI_END_OSF_FRAME entDbg
* frame to indicate that a negative return value wasn't an
* error number..
*/
- ldq $18, 0($sp) /* old syscall nr (zero if success) */
+ ldq $18, 8($sp) /* old syscall nr (zero if success) */
beq $18, $ret_success
- ldq $19, 72($sp) /* .. and this a3 */
+ ldq $19, 80($sp) /* .. and this a3 */
subq $31, $0, $0 /* with error in v0 */
addq $31, 1, $1 /* set a3 for errno return */
- stq $0, 0($sp)
+ stq $0, 8($sp)
mov $31, $26 /* tell "ret_from_sys_call" we can restart */
- stq $1, 72($sp) /* a3 for return */
+ stq $1, 80($sp) /* a3 for return */
br ret_from_sys_call
/*
@@ -559,7 +599,7 @@ CFI_END_OSF_FRAME entDbg
br ret_to_user
$work_notifysig:
- mov $sp, $16
+ addq $sp, 8, $16
DO_SWITCH_STACK
jsr $26, do_work_pending
UNDO_SWITCH_STACK
@@ -589,9 +629,9 @@ CFI_END_OSF_FRAME entDbg
ldq $16, SP_OFF+24($sp)
ldq $17, SP_OFF+32($sp)
ldq $18, SP_OFF+40($sp)
- ldq $19, 72($sp)
- ldq $20, 80($sp)
- ldq $21, 88($sp)
+ ldq $19, 80($sp)
+ ldq $20, 88($sp)
+ ldq $21, 96($sp)
/* get the system call pointer.. */
lda $1, NR_syscalls($31)
@@ -608,8 +648,8 @@ CFI_END_OSF_FRAME entDbg
/* check return.. */
blt $0, $strace_error /* the call failed */
$strace_success:
- stq $31, 72($sp) /* a3=0 => no error */
- stq $0, 0($sp) /* save return value */
+ stq $31, 80($sp) /* a3=0 => no error */
+ stq $0, 8($sp) /* save return value */
DO_SWITCH_STACK
jsr $26, syscall_trace_leave
@@ -618,14 +658,14 @@ CFI_END_OSF_FRAME entDbg
.align 3
$strace_error:
- ldq $18, 0($sp) /* old syscall nr (zero if success) */
+ ldq $18, 8($sp) /* old syscall nr (zero if success) */
beq $18, $strace_success
- ldq $19, 72($sp) /* .. and this a3 */
+ ldq $19, 80($sp) /* .. and this a3 */
subq $31, $0, $0 /* with error in v0 */
addq $31, 1, $1 /* set a3 for errno return */
- stq $0, 0($sp)
- stq $1, 72($sp) /* a3 for return */
+ stq $0, 8($sp)
+ stq $1, 80($sp) /* a3 for return */
DO_SWITCH_STACK
mov $18, $9 /* save old syscall number */
@@ -652,14 +692,14 @@ CFI_END_OSF_FRAME entSys
do_switch_stack:
lda $sp, -SWITCH_STACK_SIZE($sp)
.cfi_adjust_cfa_offset SWITCH_STACK_SIZE
- stq $9, 0($sp)
- stq $10, 8($sp)
- stq $11, 16($sp)
- stq $12, 24($sp)
- stq $13, 32($sp)
- stq $14, 40($sp)
- stq $15, 48($sp)
- stq $26, 56($sp)
+ stq $9, 8($sp)
+ stq $10, 16($sp)
+ stq $11, 24($sp)
+ stq $12, 32($sp)
+ stq $13, 40($sp)
+ stq $14, 48($sp)
+ stq $15, 56($sp)
+ stq $26, 64($sp)
ret $31, ($1), 1
.cfi_endproc
.size do_switch_stack, .-do_switch_stack
@@ -670,14 +710,14 @@ CFI_END_OSF_FRAME entSys
.cfi_def_cfa $sp, 0
.cfi_register 64, $1
undo_switch_stack:
- ldq $9, 0($sp)
- ldq $10, 8($sp)
- ldq $11, 16($sp)
- ldq $12, 24($sp)
- ldq $13, 32($sp)
- ldq $14, 40($sp)
- ldq $15, 48($sp)
- ldq $26, 56($sp)
+ ldq $9, 8($sp)
+ ldq $10, 16($sp)
+ ldq $11, 24($sp)
+ ldq $12, 32($sp)
+ ldq $13, 40($sp)
+ ldq $14, 48($sp)
+ ldq $15, 56($sp)
+ ldq $26, 64($sp)
lda $sp, SWITCH_STACK_SIZE($sp)
ret $31, ($1), 1
.cfi_endproc
@@ -733,7 +773,7 @@ CFI_END_OSF_FRAME entSys
.type alpha_switch_to, @function
.cfi_startproc
alpha_switch_to:
- DO_SWITCH_STACK
+ DO_SWITCH_STACK_0
ldl $1, TI_STATUS($8)
and $1, TS_RESTORE_FP, $3
bne $3, 1f
@@ -745,7 +785,7 @@ CFI_END_OSF_FRAME entSys
1:
call_pal PAL_swpctx
lda $8, 0x3fff
- UNDO_SWITCH_STACK
+ UNDO_SWITCH_STACK_0
bic $sp, $8, $8
mov $17, $0
ret
@@ -802,7 +842,7 @@ CFI_END_OSF_FRAME entSys
bsr $26, __save_fpu
1:
jsr $26, sys_\name
- ldq $26, 56($sp)
+ ldq $26, 64($sp)
lda $sp, SWITCH_STACK_SIZE($sp)
ret
.end alpha_\name
@@ -847,6 +887,6 @@ sigreturn_like rt_sigreturn
*/
lda $0, -ENOSYS
unop
- stq $0, 0($sp)
+ stq $0, 8($sp)
ret
.end alpha_syscall_zero
Index: linux-melmac/arch/alpha/kernel/head.S
===================================================================
--- linux-melmac.orig/arch/alpha/kernel/head.S
+++ linux-melmac/arch/alpha/kernel/head.S
@@ -25,8 +25,8 @@ __HEAD
1: ldgp $29,0($27)
/* We need to get current_task_info loaded up... */
lda $8,init_thread_union
- /* ... and find our stack ... */
- lda $30,0x4000 - SIZEOF_PT_REGS($8)
+ /* ... and find our stack, ensuring 16-byte alignment ... */
+ lda $30,0x4000 - (SIZEOF_PT_REGS + 8)($8)
/* ... and then we can start the kernel. */
jsr $26,start_kernel
call_pal PAL_halt
Index: linux-melmac/arch/alpha/kernel/process.c
===================================================================
--- linux-melmac.orig/arch/alpha/kernel/process.c
+++ linux-melmac/arch/alpha/kernel/process.c
@@ -240,7 +240,8 @@ int copy_thread(struct task_struct *p, c
struct pt_regs *regs = current_pt_regs();
struct switch_stack *childstack, *stack;
- childstack = ((struct switch_stack *) childregs) - 1;
+ childstack =
+ ((struct switch_stack *)((unsigned long *)childregs - 1)) - 1;
childti->pcb.ksp = (unsigned long) childstack;
childti->pcb.flags = 1; /* set FEN, clear everything else */
childti->status |= TS_SAVED_FP | TS_RESTORE_FP;
@@ -248,7 +249,8 @@ int copy_thread(struct task_struct *p, c
if (unlikely(args->fn)) {
/* kernel thread */
memset(childstack, 0,
- sizeof(struct switch_stack) + sizeof(struct pt_regs));
+ (sizeof(struct switch_stack) +
+ sizeof(unsigned long) + sizeof(struct pt_regs)));
childstack->r26 = (unsigned long) ret_from_kernel_thread;
childstack->r9 = (unsigned long) args->fn;
childstack->r10 = (unsigned long) args->fn_arg;
@@ -345,7 +347,7 @@ int elf_core_copy_task_fpregs(struct tas
* pointer is the 6th saved long on the kernel stack and that the
* saved return address is the first long in the frame. This all
* holds provided the thread blocked through a call to schedule() ($15
- * is the frame pointer in schedule() and $15 is saved at offset 48 by
+ * is the frame pointer in schedule() and $15 is saved at offset 56 by
* entry.S:do_switch_stack).
*
* Under heavy swap load I've seen this lose in an ugly way. So do
@@ -360,8 +362,8 @@ thread_saved_pc(struct task_struct *t)
unsigned long base = (unsigned long)task_stack_page(t);
unsigned long fp, sp = task_thread_info(t)->pcb.ksp;
- if (sp > base && sp+6*8 < base + 16*1024) {
- fp = ((unsigned long*)sp)[6];
+ if (sp > base && sp + 7 * 8 < base + 16 * 1024) {
+ fp = ((unsigned long *)sp)[7];
if (fp > sp && fp < base + 16*1024)
return *(unsigned long *)fp;
}