Re: Question: livepatch failed for new fork() task stack unreliable
On Tue, Jun 30, 2020 at 09:04:12PM +0800, Wangshaobo (bobo) wrote: > > 在 2020/6/5 9:51, Josh Poimboeuf 写道: > > On Fri, Jun 05, 2020 at 09:26:42AM +0800, Wangshaobo (bobo) wrote: > > > > > So, I want to ask is there any side effects if i modify like this ? > > > > > this > > > > > modification is based on > > > > > > > > > > your fix. It looks like ok with proper test. > > > > > > > > > > diff --git a/arch/x86/kernel/unwind_orc.c > > > > > b/arch/x86/kernel/unwind_orc.c > > > > > index e9cc182aa97e..ecce5051e8fd 100644 > > > > > --- a/arch/x86/kernel/unwind_orc.c > > > > > +++ b/arch/x86/kernel/unwind_orc.c > > > > > @@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, > > > > > struct > > > > > task_struct *task, > > > > > state->sp = task->thread.sp; > > > > > state->bp = READ_ONCE_NOCHECK(frame->bp); > > > > > state->ip = READ_ONCE_NOCHECK(frame->ret_addr); > > > > > + state->signal = ((void *)state->ip == ret_from_fork); > > > > > } > > > > > > > > > > diff --git a/arch/x86/kernel/unwind_orc.c > > > > > b/arch/x86/kernel/unwind_orc.c > > > > > index 7f969b2d240f..d7396431261a 100644 > > > > > --- a/arch/x86/kernel/unwind_orc.c > > > > > +++ b/arch/x86/kernel/unwind_orc.c > > > > > @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state) > > > > > state->sp = sp; > > > > > state->regs = NULL; > > > > > state->prev_regs = NULL; > > > > > - state->signal = ((void *)state->ip == ret_from_fork); > > > > > + state->signal = false; > > > > > break; > > > > Yes that's correct. > > > Hi, josh > > > > > > Could i ask when are you free to send the patch, all the tests are passed > > > by. > > I want to run some regression tests, so it will probably be next week. Sorry, I was away for a bit and I didn't get a chance to send the patch. I should hopefully have it ready soon. -- Josh
Re: Question: livepatch failed for new fork() task stack unreliable
在 2020/6/5 9:51, Josh Poimboeuf 写道: On Fri, Jun 05, 2020 at 09:26:42AM +0800, Wangshaobo (bobo) wrote: So, I want to ask is there any side effects if i modify like this ? this modification is based on your fix. It looks like ok with proper test. diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index e9cc182aa97e..ecce5051e8fd 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task, state->sp = task->thread.sp; state->bp = READ_ONCE_NOCHECK(frame->bp); state->ip = READ_ONCE_NOCHECK(frame->ret_addr); + state->signal = ((void *)state->ip == ret_from_fork); } diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 7f969b2d240f..d7396431261a 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state) state->sp = sp; state->regs = NULL; state->prev_regs = NULL; - state->signal = ((void *)state->ip == ret_from_fork); + state->signal = false; break; Yes that's correct. Hi, josh Could i ask when are you free to send the patch, all the tests are passed by. I want to run some regression tests, so it will probably be next week. Hi, josh did you send this patch, I haven't received it up to now, so i ask u to confirm again. thanks, Wang ShaoBo
Re: Question: livepatch failed for new fork() task stack unreliable
On Fri, Jun 05, 2020 at 09:26:42AM +0800, Wangshaobo (bobo) wrote: > > > So, I want to ask is there any side effects if i modify like this ? this > > > modification is based on > > > > > > your fix. It looks like ok with proper test. > > > > > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c > > > index e9cc182aa97e..ecce5051e8fd 100644 > > > --- a/arch/x86/kernel/unwind_orc.c > > > +++ b/arch/x86/kernel/unwind_orc.c > > > @@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, struct > > > task_struct *task, > > > state->sp = task->thread.sp; > > > state->bp = READ_ONCE_NOCHECK(frame->bp); > > > state->ip = READ_ONCE_NOCHECK(frame->ret_addr); > > > + state->signal = ((void *)state->ip == ret_from_fork); > > > } > > > > > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c > > > index 7f969b2d240f..d7396431261a 100644 > > > --- a/arch/x86/kernel/unwind_orc.c > > > +++ b/arch/x86/kernel/unwind_orc.c > > > @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state) > > > state->sp = sp; > > > state->regs = NULL; > > > state->prev_regs = NULL; > > > - state->signal = ((void *)state->ip == ret_from_fork); > > > + state->signal = false; > > > break; > > Yes that's correct. > > Hi, josh > > Could i ask when are you free to send the patch, all the tests are passed > by. I want to run some regression tests, so it will probably be next week. -- Josh
Re: Question: livepatch failed for new fork() task stack unreliable
在 2020/6/4 10:40, Josh Poimboeuf 写道: On Thu, Jun 04, 2020 at 09:24:55AM +0800, Wangshaobo (bobo) wrote: 在 2020/6/3 23:33, Josh Poimboeuf 写道: On Wed, Jun 03, 2020 at 10:06:07PM +0800, Wangshaobo (bobo) wrote: To be honest, I don't remember what I meant by sibling calls. They don't even leave anything on the stack. For noreturns, the code might be laid out like this: func1: ... call noreturn_foo func2: func2 is immediately after the call to noreturn_foo. So the return address on the stack will actually be 'func2'. We want to retrieve the ORC data for the call instruction (inside func1), instead of the instruction at the beginning of func2. I should probably update that comment. So, I want to ask is there any side effects if i modify like this ? this modification is based on your fix. It looks like ok with proper test. diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index e9cc182aa97e..ecce5051e8fd 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task, state->sp = task->thread.sp; state->bp = READ_ONCE_NOCHECK(frame->bp); state->ip = READ_ONCE_NOCHECK(frame->ret_addr); + state->signal = ((void *)state->ip == ret_from_fork); } diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 7f969b2d240f..d7396431261a 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state) state->sp = sp; state->regs = NULL; state->prev_regs = NULL; - state->signal = ((void *)state->ip == ret_from_fork); + state->signal = false; break; Yes that's correct. Hi, josh Could i ask when are you free to send the patch, all the tests are passed by. thanks for your help. Wang ShaoBo
Re: Question: livepatch failed for new fork() task stack unreliable
On Thu, Jun 04, 2020 at 09:24:55AM +0800, Wangshaobo (bobo) wrote: > > 在 2020/6/3 23:33, Josh Poimboeuf 写道: > > On Wed, Jun 03, 2020 at 10:06:07PM +0800, Wangshaobo (bobo) wrote: > > To be honest, I don't remember what I meant by sibling calls. They > > don't even leave anything on the stack. > > > > For noreturns, the code might be laid out like this: > > > > func1: > > ... > > call noreturn_foo > > func2: > > > > func2 is immediately after the call to noreturn_foo. So the return > > address on the stack will actually be 'func2'. We want to retrieve the > > ORC data for the call instruction (inside func1), instead of the > > instruction at the beginning of func2. > > > > I should probably update that comment. > > So, I want to ask is there any side effects if i modify like this ? this > modification is based on > > your fix. It looks like ok with proper test. > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c > index e9cc182aa97e..ecce5051e8fd 100644 > --- a/arch/x86/kernel/unwind_orc.c > +++ b/arch/x86/kernel/unwind_orc.c > @@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, struct > task_struct *task, > state->sp = task->thread.sp; > state->bp = READ_ONCE_NOCHECK(frame->bp); > state->ip = READ_ONCE_NOCHECK(frame->ret_addr); > + state->signal = ((void *)state->ip == ret_from_fork); > } > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c > index 7f969b2d240f..d7396431261a 100644 > --- a/arch/x86/kernel/unwind_orc.c > +++ b/arch/x86/kernel/unwind_orc.c > @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state) > state->sp = sp; > state->regs = NULL; > state->prev_regs = NULL; > - state->signal = ((void *)state->ip == ret_from_fork); > + state->signal = false; > break; Yes that's correct. -- Josh
Re: Question: livepatch failed for new fork() task stack unreliable
在 2020/6/3 23:33, Josh Poimboeuf 写道: On Wed, Jun 03, 2020 at 10:06:07PM +0800, Wangshaobo (bobo) wrote: To be honest, I don't remember what I meant by sibling calls. They don't even leave anything on the stack. For noreturns, the code might be laid out like this: func1: ... call noreturn_foo func2: func2 is immediately after the call to noreturn_foo. So the return address on the stack will actually be 'func2'. We want to retrieve the ORC data for the call instruction (inside func1), instead of the instruction at the beginning of func2. I should probably update that comment. So, I want to ask is there any side effects if i modify like this ? this modification is based on your fix. It looks like ok with proper test. diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index e9cc182aa97e..ecce5051e8fd 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task, state->sp = task->thread.sp; state->bp = READ_ONCE_NOCHECK(frame->bp); state->ip = READ_ONCE_NOCHECK(frame->ret_addr); + state->signal = ((void *)state->ip == ret_from_fork); } diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 7f969b2d240f..d7396431261a 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state) state->sp = sp; state->regs = NULL; state->prev_regs = NULL; - state->signal = ((void *)state->ip == ret_from_fork); + state->signal = false; break; thanks, Wang ShaoBo
Re: Question: livepatch failed for new fork() task stack unreliable
On Wed, Jun 03, 2020 at 10:06:07PM +0800, Wangshaobo (bobo) wrote: > Today I test your fix, but arch_stack_walk_reliable() still return failed > sometimes, I > > found one of three scenarios mentioned failed: > > > 1. user task (just fork) but not been scheduled > > test FAILED > > it is because unwind_next_frame() get the first frame, this time > state->signal is false, and then return > > failed in the same place for ret_from_fork has not executed at all. Oops - I meant to do it in __unwind_start (as you discovered). > 2. user task (just fork) start excuting ret_from_fork() till schedule_tail > but not UNWIND_HINT_REGS > > test condition :loop fork() in current system > > result : SUCCESS, > > it looks like this modification works for my perspective : > > - /* Success path for non-user tasks, i.e. kthreads and idle > tasks */ > - if (!(task->flags & (PF_KTHREAD | PF_IDLE))) > - return -EINVAL; > but is this possible to miss one invalid judgement condition ? (1) I'm not sure I understand your question, but I think this change shouldn't break anything else because the ORC unwinder is careful to always set an error if it doesn't reach the end of the stack, especially with my recent ORC fixes which went into 5.7. > 3. call_usermodehelper_exec_async > > test condition :loop call call_usermodehelper() in a module selfmade. > > result : SUCCESS, > > it looks state->signal==true works when unwind_next_frame() gets the end > ret_from_fork() frame, > > but i don't know how does it work, i am confused by this sentences, how > does the comment means sibling calls and > > calls to noreturn functions? (2) > > /* > * Find the orc_entry associated with the text address. > * > * Decrement call return addresses by one so they work for sibling > * calls and calls to noreturn functions. > */ > orc = orc_find(state->signal ? state->ip : state->ip - 1); > if (!orc) { To be honest, I don't remember what I meant by sibling calls. They don't even leave anything on the stack. For noreturns, the code might be laid out like this: func1: ... call noreturn_foo func2: func2 is immediately after the call to noreturn_foo. So the return address on the stack will actually be 'func2'. We want to retrieve the ORC data for the call instruction (inside func1), instead of the instruction at the beginning of func2. I should probably update that comment. -- Josh
Re: Question: livepatch failed for new fork() task stack unreliable
在 2020/6/2 21:14, Josh Poimboeuf 写道: On Tue, Jun 02, 2020 at 09:22:30AM +0800, Wangshaobo (bobo) wrote: so i think this question is related to ORC unwinder, could i ask if you have strategy or plan to avoid this problem ? I suspect something like this would fix it (untested): diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c index 6ad43fc44556..8cf95ded1410 100644 --- a/arch/x86/kernel/stacktrace.c +++ b/arch/x86/kernel/stacktrace.c @@ -50,7 +50,7 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, if (regs) { /* Success path for user tasks */ if (user_mode(regs)) - return 0; + break; /* * Kernel mode registers on the stack indicate an @@ -81,10 +81,6 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, if (unwind_error()) return -EINVAL; - /* Success path for non-user tasks, i.e. kthreads and idle tasks */ - if (!(task->flags & (PF_KTHREAD | PF_IDLE))) - return -EINVAL; - return 0; } diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 7f969b2d240f..d7396431261a 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state) state->sp = sp; state->regs = NULL; state->prev_regs = NULL; - state->signal = false; + state->signal = ((void *)state->ip == ret_from_fork); break; case ORC_TYPE_REGS: what a awesome job, thanks a lot, Josh Today I test your fix, but arch_stack_walk_reliable() still return failed sometimes, I found one of three scenarios mentioned failed: 1. user task (just fork) but not been scheduled test FAILED it is because unwind_next_frame() get the first frame, this time state->signal is false, and then return failed in the same place for ret_from_fork has not executed at all. 2. user task (just fork) start excuting ret_from_fork() till schedule_tail but not UNWIND_HINT_REGS test condition :loop fork() in current system result : SUCCESS, it looks like this modification works for my perspective : - /* Success path for non-user tasks, i.e. kthreads and idle tasks */ - if (!(task->flags & (PF_KTHREAD | PF_IDLE))) - return -EINVAL; but is this possible to miss one invalid judgement condition ? (1) 3. call_usermodehelper_exec_async test condition :loop call call_usermodehelper() in a module selfmade. result : SUCCESS, it looks state->signal==true works when unwind_next_frame() gets the end ret_from_fork() frame, but i don't know how does it work, i am confused by this sentences, how does the comment means sibling calls and calls to noreturn functions? (2) /* * Find the orc_entry associated with the text address. * * Decrement call return addresses by one so they work for sibling * calls and calls to noreturn functions. */ orc = orc_find(state->signal ? state->ip : state->ip - 1); if (!orc) { So i slightly modify your code, i move state->signal = ((void *)state->ip == ret_from_fork) to unwind_start() and render unwind_next_frame() remain the same as before: diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index e9cc182aa97e..ecce5051e8fd 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task, state->sp = task->thread.sp; state->bp = READ_ONCE_NOCHECK(frame->bp); state->ip = READ_ONCE_NOCHECK(frame->ret_addr); + state->signal = ((void *)state->ip == ret_from_fork); } diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 7f969b2d240f..d7396431261a 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state) state->sp = sp; state->regs = NULL; state->prev_regs = NULL; - state->signal = ((void *)state->ip == ret_from_fork); + state->signal = false; break; After modification all the three scenarios are captured and no longer return failed, but i don't know how does it affect the scenarios 3, because current frame->ret_addr(the first frame) is not ret_from_fork, it should return failed as scenarios1, but it didn't , i really want to know the reason. (3) thanks again Wang ShaoBo
Re: Question: livepatch failed for new fork() task stack unreliable
On Tue, Jun 02, 2020 at 09:22:30AM +0800, Wangshaobo (bobo) wrote: > so i think this question is related to ORC unwinder, could i ask if you have > strategy or plan to avoid this problem ? I suspect something like this would fix it (untested): diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c index 6ad43fc44556..8cf95ded1410 100644 --- a/arch/x86/kernel/stacktrace.c +++ b/arch/x86/kernel/stacktrace.c @@ -50,7 +50,7 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, if (regs) { /* Success path for user tasks */ if (user_mode(regs)) - return 0; + break; /* * Kernel mode registers on the stack indicate an @@ -81,10 +81,6 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, if (unwind_error()) return -EINVAL; - /* Success path for non-user tasks, i.e. kthreads and idle tasks */ - if (!(task->flags & (PF_KTHREAD | PF_IDLE))) - return -EINVAL; - return 0; } diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 7f969b2d240f..d7396431261a 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state) state->sp = sp; state->regs = NULL; state->prev_regs = NULL; - state->signal = false; + state->signal = ((void *)state->ip == ret_from_fork); break; case ORC_TYPE_REGS:
Re: Question: livepatch failed for new fork() task stack unreliable
在 2020/6/2 2:05, Josh Poimboeuf 写道: On Sat, May 30, 2020 at 10:21:19AM +0800, Wangshaobo (bobo) wrote: 1) when a user mode task just fork start excuting ret_from_fork() till schedule_tail, unwind_next_frame found orc->sp_reg is ORC_REG_UNDEFINED but orc->end not equals zero, this time arch_stack_walk_reliable() terminates it's backtracing loop for unwind_done() return true. then 'if (!(task->flags & (PF_KTHREAD | PF_IDLE)))' in arch_stack_walk_reliable() true and return -EINVAL after. * The stack trace looks like that: ret_from_fork -=> UNWIND_HINT_EMPTY -=> schedule_tail /* schedule out */ ... -=> UNWIND_HINT_REGS /* UNDO */ Yes, makes sense. 2) when using call_usermodehelper_exec_async() to create a user mode task, ret_from_fork() still not exec whereas the task has been scheduled in __schedule(), at this time, orc->sp_reg is ORC_REG_UNDEFINED but orc->end equals zero, unwind_error() return true and also terminates arch_stack_walk_reliable()'s backtracing loop, end up return from 'if (unwind_error())' branch. * The stack trace looks like that: -=> call_usermodehelper_exec -=> do_exec -=> search_binary_handler -=> load_elf_binary -=> elf_map -=> vm_mmap_pgoff -=> down_write_killable -=> _cond_resched -=> __schedule /* scheduled to work */ -=> ret_from_fork /* UNDO */ I don't quite follow the stacktrace, but it sounds like the issue is the same as the first one you originally reported: yes, true, same as the first one, the only difference what i want to say is the task has been scheduled but the first one is not. 1) The task was not actually scheduled to excute, at this time UNWIND_HINT_EMPTY in ret_from_fork() has not reset unwind_hint, it's sp_reg and end field remain default value and end up throwing an error in unwind_next_frame() when called by arch_stack_walk_reliable(); Or am I misunderstanding? And to reiterate, these are not "livepatch failures", right? Livepatch doesn't fail when stack_trace_save_tsk_reliable() returns an error. It recovers gracefully and tries again later. yes, you are right, "livepatch failures" only indicates serveral retry failures, we found if frequent fork() happend in current system, it is easier to cause retry but still always end up success. so i think this question is related to ORC unwinder, could i ask if you have strategy or plan to avoid this problem ? thanks, Wang ShaoBo
Re: Question: livepatch failed for new fork() task stack unreliable
On Sat, May 30, 2020 at 10:21:19AM +0800, Wangshaobo (bobo) wrote: > 1) when a user mode task just fork start excuting ret_from_fork() till > schedule_tail, unwind_next_frame found > > orc->sp_reg is ORC_REG_UNDEFINED but orc->end not equals zero, this time > arch_stack_walk_reliable() > > terminates it's backtracing loop for unwind_done() return true. then 'if > (!(task->flags & (PF_KTHREAD | PF_IDLE)))' > > in arch_stack_walk_reliable() true and return -EINVAL after. > > * The stack trace looks like that: > > ret_from_fork > > -=> UNWIND_HINT_EMPTY > > -=> schedule_tail /* schedule out */ > > ... > > -=> UNWIND_HINT_REGS /* UNDO */ Yes, makes sense. > 2) when using call_usermodehelper_exec_async() to create a user mode task, > ret_from_fork() still not exec whereas > > the task has been scheduled in __schedule(), at this time, orc->sp_reg is > ORC_REG_UNDEFINED but orc->end equals zero, > > unwind_error() return true and also terminates arch_stack_walk_reliable()'s > backtracing loop, end up return from > > 'if (unwind_error())' branch. > > * The stack trace looks like that: > > -=> call_usermodehelper_exec > > -=> do_exec > > -=> search_binary_handler > > -=> load_elf_binary > > -=> elf_map > > -=> vm_mmap_pgoff > > -=> down_write_killable > > -=> _cond_resched > > -=> __schedule /* scheduled to work */ > > -=> ret_from_fork /* UNDO */ I don't quite follow the stacktrace, but it sounds like the issue is the same as the first one you originally reported: > 1) The task was not actually scheduled to excute, at this time > UNWIND_HINT_EMPTY in ret_from_fork() has not reset unwind_hint, it's > sp_reg and end field remain default value and end up throwing an error > in unwind_next_frame() when called by arch_stack_walk_reliable(); Or am I misunderstanding? And to reiterate, these are not "livepatch failures", right? Livepatch doesn't fail when stack_trace_save_tsk_reliable() returns an error. It recovers gracefully and tries again later. -- Josh
Re: Question: livepatch failed for new fork() task stack unreliable
On Fri, May 29, 2020 at 06:10:59PM +0800, Wang ShaoBo wrote: > Stack unreliable error is reported by stack_trace_save_tsk_reliable() when > trying > to insmod a hot patch for module modification, this results in frequent > failures > sometimes. We found this 'unreliable' stack is from task just fork. For livepatch, this shouldn't actually be a failure. The patch will just stay in the transition state until after the fork has completed. Which should happen in a reasonable amount of time, right? > 1) The task was not actually scheduled to excute, at this time > UNWIND_HINT_EMPTY in > ret_from_fork() has not reset unwind_hint, it's sp_reg and end field remain > default value > and end up throwing an error in unwind_next_frame() when called by > arch_stack_walk_reliable(); Yes, this seems to be true for forked-but-not-yet-scheduled tasks. I can look at fixing that. I have some ORC cleanups in progress which are related to UNWIND_HINT_EMPTY and the end of the stack. I can add this issue to the list of improvements. > 2) The task has been scheduled but UNWIND_HINT_REGS not finished, at this time > arch_stack_walk_reliable() terminates it's backtracing loop for pt_regs > unknown > and return -EINVAL because it's a user task. Hm, do you see this problem with upstream? It seems like it should work. arch_stack_walk_reliable() has this: /* Success path for user tasks */ if (user_mode(regs)) return 0; Where exactly is the error coming from? -- Josh