Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-30 Thread Josh Poimboeuf
On Tue, Jun 30, 2020 at 09:04:12PM +0800, Wangshaobo (bobo) wrote:
> 
> 在 2020/6/5 9:51, Josh Poimboeuf 写道:
> > On Fri, Jun 05, 2020 at 09:26:42AM +0800, Wangshaobo (bobo) wrote:
> > > > > So, I want to ask is there any side effects if i modify like this ? 
> > > > > this
> > > > > modification is based on
> > > > > 
> > > > > your fix. It looks like ok with proper test.
> > > > > 
> > > > > diff --git a/arch/x86/kernel/unwind_orc.c 
> > > > > b/arch/x86/kernel/unwind_orc.c
> > > > > index e9cc182aa97e..ecce5051e8fd 100644
> > > > > --- a/arch/x86/kernel/unwind_orc.c
> > > > > +++ b/arch/x86/kernel/unwind_orc.c
> > > > > @@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, 
> > > > > struct
> > > > > task_struct *task,
> > > > >       state->sp = task->thread.sp;
> > > > >       state->bp = READ_ONCE_NOCHECK(frame->bp);
> > > > >       state->ip = READ_ONCE_NOCHECK(frame->ret_addr);
> > > > > +  state->signal = ((void *)state->ip == ret_from_fork);
> > > > >       }
> > > > > 
> > > > > diff --git a/arch/x86/kernel/unwind_orc.c 
> > > > > b/arch/x86/kernel/unwind_orc.c
> > > > > index 7f969b2d240f..d7396431261a 100644
> > > > > --- a/arch/x86/kernel/unwind_orc.c
> > > > > +++ b/arch/x86/kernel/unwind_orc.c
> > > > > @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state)
> > > > >    state->sp = sp;
> > > > >    state->regs = NULL;
> > > > >    state->prev_regs = NULL;
> > > > > -    state->signal = ((void *)state->ip == ret_from_fork);
> > > > > +    state->signal = false;
> > > > >    break;
> > > > Yes that's correct.
> > > Hi, josh
> > > 
> > > Could i ask when are you free to send the patch, all the tests are passed
> > > by.
> > I want to run some regression tests, so it will probably be next week.

Sorry, I was away for a bit and I didn't get a chance to send the patch.
I should hopefully have it ready soon.

-- 
Josh



Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-30 Thread Wangshaobo (bobo)



在 2020/6/5 9:51, Josh Poimboeuf 写道:

On Fri, Jun 05, 2020 at 09:26:42AM +0800, Wangshaobo (bobo) wrote:

So, I want to ask is there any side effects if i modify like this ? this
modification is based on

your fix. It looks like ok with proper test.

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index e9cc182aa97e..ecce5051e8fd 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, struct
task_struct *task,
      state->sp = task->thread.sp;
      state->bp = READ_ONCE_NOCHECK(frame->bp);
      state->ip = READ_ONCE_NOCHECK(frame->ret_addr);
+  state->signal = ((void *)state->ip == ret_from_fork);
      }

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 7f969b2d240f..d7396431261a 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state)
   state->sp = sp;
   state->regs = NULL;
   state->prev_regs = NULL;
-    state->signal = ((void *)state->ip == ret_from_fork);
+    state->signal = false;
   break;

Yes that's correct.

Hi, josh

Could i ask when are you free to send the patch, all the tests are passed
by.

I want to run some regression tests, so it will probably be next week.


Hi, josh

did you send this patch, I haven't received it up to now, so i ask u to 
confirm again.


thanks,

Wang ShaoBo




Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-04 Thread Josh Poimboeuf
On Fri, Jun 05, 2020 at 09:26:42AM +0800, Wangshaobo (bobo) wrote:
> > > So, I want to ask is there any side effects if i modify like this ? this
> > > modification is based on
> > > 
> > > your fix. It looks like ok with proper test.
> > > 
> > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> > > index e9cc182aa97e..ecce5051e8fd 100644
> > > --- a/arch/x86/kernel/unwind_orc.c
> > > +++ b/arch/x86/kernel/unwind_orc.c
> > > @@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, struct
> > > task_struct *task,
> > >      state->sp = task->thread.sp;
> > >      state->bp = READ_ONCE_NOCHECK(frame->bp);
> > >      state->ip = READ_ONCE_NOCHECK(frame->ret_addr);
> > > +  state->signal = ((void *)state->ip == ret_from_fork);
> > >      }
> > > 
> > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> > > index 7f969b2d240f..d7396431261a 100644
> > > --- a/arch/x86/kernel/unwind_orc.c
> > > +++ b/arch/x86/kernel/unwind_orc.c
> > > @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state)
> > >   state->sp = sp;
> > >   state->regs = NULL;
> > >   state->prev_regs = NULL;
> > > -    state->signal = ((void *)state->ip == ret_from_fork);
> > > +    state->signal = false;
> > >   break;
> > Yes that's correct.
> 
> Hi, josh
> 
> Could i ask when are you free to send the patch, all the tests are passed
> by.

I want to run some regression tests, so it will probably be next week.

-- 
Josh



Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-04 Thread Wangshaobo (bobo)



在 2020/6/4 10:40, Josh Poimboeuf 写道:

On Thu, Jun 04, 2020 at 09:24:55AM +0800, Wangshaobo (bobo) wrote:

在 2020/6/3 23:33, Josh Poimboeuf 写道:

On Wed, Jun 03, 2020 at 10:06:07PM +0800, Wangshaobo (bobo) wrote:
To be honest, I don't remember what I meant by sibling calls.  They
don't even leave anything on the stack.

For noreturns, the code might be laid out like this:

func1:
...
call noreturn_foo
func2:

func2 is immediately after the call to noreturn_foo.  So the return
address on the stack will actually be 'func2'.  We want to retrieve the
ORC data for the call instruction (inside func1), instead of the
instruction at the beginning of func2.

I should probably update that comment.

So, I want to ask is there any side effects if i modify like this ? this
modification is based on

your fix. It looks like ok with proper test.

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index e9cc182aa97e..ecce5051e8fd 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, struct
task_struct *task,
     state->sp = task->thread.sp;
     state->bp = READ_ONCE_NOCHECK(frame->bp);
     state->ip = READ_ONCE_NOCHECK(frame->ret_addr);
+  state->signal = ((void *)state->ip == ret_from_fork);
     }

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 7f969b2d240f..d7396431261a 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state)
  state->sp = sp;
  state->regs = NULL;
  state->prev_regs = NULL;
-    state->signal = ((void *)state->ip == ret_from_fork);
+    state->signal = false;
  break;

Yes that's correct.


Hi, josh

Could i ask when are you free to send the patch, all the tests are 
passed by.


thanks for your help.

Wang ShaoBo







Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-03 Thread Josh Poimboeuf
On Thu, Jun 04, 2020 at 09:24:55AM +0800, Wangshaobo (bobo) wrote:
> 
> 在 2020/6/3 23:33, Josh Poimboeuf 写道:
> > On Wed, Jun 03, 2020 at 10:06:07PM +0800, Wangshaobo (bobo) wrote:
> > To be honest, I don't remember what I meant by sibling calls.  They
> > don't even leave anything on the stack.
> > 
> > For noreturns, the code might be laid out like this:
> > 
> > func1:
> > ...
> > call noreturn_foo
> > func2:
> > 
> > func2 is immediately after the call to noreturn_foo.  So the return
> > address on the stack will actually be 'func2'.  We want to retrieve the
> > ORC data for the call instruction (inside func1), instead of the
> > instruction at the beginning of func2.
> > 
> > I should probably update that comment.
> 
> So, I want to ask is there any side effects if i modify like this ? this
> modification is based on
> 
> your fix. It looks like ok with proper test.
> 
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index e9cc182aa97e..ecce5051e8fd 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, struct
> task_struct *task,
>     state->sp = task->thread.sp;
>     state->bp = READ_ONCE_NOCHECK(frame->bp);
>     state->ip = READ_ONCE_NOCHECK(frame->ret_addr);
> +  state->signal = ((void *)state->ip == ret_from_fork);
>     }
> 
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index 7f969b2d240f..d7396431261a 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state)
>  state->sp = sp;
>  state->regs = NULL;
>  state->prev_regs = NULL;
> -    state->signal = ((void *)state->ip == ret_from_fork);
> +    state->signal = false;
>  break;

Yes that's correct.

-- 
Josh



Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-03 Thread Wangshaobo (bobo)



在 2020/6/3 23:33, Josh Poimboeuf 写道:

On Wed, Jun 03, 2020 at 10:06:07PM +0800, Wangshaobo (bobo) wrote:
To be honest, I don't remember what I meant by sibling calls.  They
don't even leave anything on the stack.

For noreturns, the code might be laid out like this:

func1:
...
call noreturn_foo
func2:

func2 is immediately after the call to noreturn_foo.  So the return
address on the stack will actually be 'func2'.  We want to retrieve the
ORC data for the call instruction (inside func1), instead of the
instruction at the beginning of func2.

I should probably update that comment.


So, I want to ask is there any side effects if i modify like this ? this 
modification is based on


your fix. It looks like ok with proper test.

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index e9cc182aa97e..ecce5051e8fd 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, 
struct task_struct *task,

    state->sp = task->thread.sp;
    state->bp = READ_ONCE_NOCHECK(frame->bp);
    state->ip = READ_ONCE_NOCHECK(frame->ret_addr);
+  state->signal = ((void *)state->ip == ret_from_fork);
    }

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 7f969b2d240f..d7396431261a 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state)
 state->sp = sp;
 state->regs = NULL;
 state->prev_regs = NULL;
-    state->signal = ((void *)state->ip == ret_from_fork);
+    state->signal = false;
 break;

thanks,

Wang ShaoBo




Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-03 Thread Josh Poimboeuf
On Wed, Jun 03, 2020 at 10:06:07PM +0800, Wangshaobo (bobo) wrote:
> Today I test your fix, but arch_stack_walk_reliable() still return failed
> sometimes, I
> 
> found one of three scenarios mentioned failed:
> 
> 
> 1. user task (just fork) but not been scheduled
> 
>     test FAILED
> 
>     it is because unwind_next_frame() get the first frame, this time
> state->signal is false, and then return
> 
>     failed in the same place for ret_from_fork has not executed at all.

Oops - I meant to do it in __unwind_start (as you discovered).

> 2. user task (just fork) start excuting ret_from_fork() till schedule_tail
> but not UNWIND_HINT_REGS
> 
>     test condition :loop fork() in current  system
> 
>     result : SUCCESS,
> 
>     it looks like this modification works for my perspective :
> 
>   -   /* Success path for non-user tasks, i.e. kthreads and idle 
> tasks */
>   -   if (!(task->flags & (PF_KTHREAD | PF_IDLE)))
>   -   return -EINVAL;
>   but is this possible to miss one invalid judgement condition ? (1)

I'm not sure I understand your question, but I think this change
shouldn't break anything else because the ORC unwinder is careful to
always set an error if it doesn't reach the end of the stack, especially
with my recent ORC fixes which went into 5.7.

> 3. call_usermodehelper_exec_async
> 
>     test condition :loop call call_usermodehelper() in a module selfmade.
> 
>     result : SUCCESS,
> 
>    it looks state->signal==true works when unwind_next_frame() gets the end
> ret_from_fork() frame,
> 
>    but i don't know how does it work, i am confused by this sentences, how
> does the comment means sibling calls and
> 
>     calls to noreturn functions? (2)
> 
>             /*
>          * Find the orc_entry associated with the text address.
>          *
>              * Decrement call return addresses by one so they work for sibling
>          * calls and calls to noreturn functions.
>          */
>             orc = orc_find(state->signal ? state->ip : state->ip - 1);
>             if (!orc) {

To be honest, I don't remember what I meant by sibling calls.  They
don't even leave anything on the stack.

For noreturns, the code might be laid out like this:

func1:
...
call noreturn_foo
func2:

func2 is immediately after the call to noreturn_foo.  So the return
address on the stack will actually be 'func2'.  We want to retrieve the
ORC data for the call instruction (inside func1), instead of the
instruction at the beginning of func2.

I should probably update that comment.

-- 
Josh



Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-03 Thread Wangshaobo (bobo)



在 2020/6/2 21:14, Josh Poimboeuf 写道:

On Tue, Jun 02, 2020 at 09:22:30AM +0800, Wangshaobo (bobo) wrote:

so i think this question is related to ORC unwinder, could i ask if you have
strategy or plan to avoid this problem ?

I suspect something like this would fix it (untested):

diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 6ad43fc44556..8cf95ded1410 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -50,7 +50,7 @@ int arch_stack_walk_reliable(stack_trace_consume_fn 
consume_entry,
if (regs) {
/* Success path for user tasks */
if (user_mode(regs))
-   return 0;
+   break;
  
  			/*

 * Kernel mode registers on the stack indicate an
@@ -81,10 +81,6 @@ int arch_stack_walk_reliable(stack_trace_consume_fn 
consume_entry,
if (unwind_error())
return -EINVAL;
  
-	/* Success path for non-user tasks, i.e. kthreads and idle tasks */

-   if (!(task->flags & (PF_KTHREAD | PF_IDLE)))
-   return -EINVAL;
-
return 0;
  }
  
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c

index 7f969b2d240f..d7396431261a 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state)
state->sp = sp;
state->regs = NULL;
state->prev_regs = NULL;
-   state->signal = false;
+   state->signal = ((void *)state->ip == ret_from_fork);
break;
  
  	case ORC_TYPE_REGS:


what a awesome job, thanks a lot, Josh

Today I test your fix, but arch_stack_walk_reliable() still return 
failed sometimes, I


found one of three scenarios mentioned failed:


1. user task (just fork) but not been scheduled

    test FAILED

    it is because unwind_next_frame() get the first frame, this time 
state->signal is false, and then return


    failed in the same place for ret_from_fork has not executed at all.


2. user task (just fork) start excuting ret_from_fork() till 
schedule_tail but not UNWIND_HINT_REGS


    test condition :loop fork() in current  system

    result : SUCCESS,

    it looks like this modification works for my perspective :

-   /* Success path for non-user tasks, i.e. kthreads and idle 
tasks */
-   if (!(task->flags & (PF_KTHREAD | PF_IDLE)))
-   return -EINVAL;
  but is this possible to miss one invalid judgement condition ? (1)

3. call_usermodehelper_exec_async

    test condition :loop call call_usermodehelper() in a module selfmade.

    result : SUCCESS,

   it looks state->signal==true works when unwind_next_frame() gets the 
end ret_from_fork() frame,


   but i don't know how does it work, i am confused by this sentences, 
how does the comment means sibling calls and


    calls to noreturn functions? (2)

            /*
         * Find the orc_entry associated with the text address.
         *
             * Decrement call return addresses by one so they work for 
sibling

         * calls and calls to noreturn functions.
         */
            orc = orc_find(state->signal ? state->ip : state->ip - 1);
            if (!orc) {


So i slightly modify your code, i move  state->signal = ((void 
*)state->ip == ret_from_fork) to unwind_start()


and render unwind_next_frame() remain the same as before:

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index e9cc182aa97e..ecce5051e8fd 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -620,6 +620,7 @@ void __unwind_start(struct unwind_state *state, 
struct task_struct *task,

    state->sp = task->thread.sp;
    state->bp = READ_ONCE_NOCHECK(frame->bp);
    state->ip = READ_ONCE_NOCHECK(frame->ret_addr);
+  state->signal = ((void *)state->ip == ret_from_fork);
    }

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 7f969b2d240f..d7396431261a 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state)
state->sp = sp;
state->regs = NULL;
state->prev_regs = NULL;
-   state->signal = ((void *)state->ip == ret_from_fork);
+   state->signal = false;
    break;


After modification all the three scenarios are captured and no longer 
return failed,  but i don't know


how does it affect the scenarios 3, because current frame->ret_addr(the 
first frame) is not ret_from_fork,


it should return failed as scenarios1, but it didn't , i really want to 
know the reason. (3)



thanks again

Wang ShaoBo




Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-02 Thread Josh Poimboeuf
On Tue, Jun 02, 2020 at 09:22:30AM +0800, Wangshaobo (bobo) wrote:
> so i think this question is related to ORC unwinder, could i ask if you have
> strategy or plan to avoid this problem ?

I suspect something like this would fix it (untested):

diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 6ad43fc44556..8cf95ded1410 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -50,7 +50,7 @@ int arch_stack_walk_reliable(stack_trace_consume_fn 
consume_entry,
if (regs) {
/* Success path for user tasks */
if (user_mode(regs))
-   return 0;
+   break;
 
/*
 * Kernel mode registers on the stack indicate an
@@ -81,10 +81,6 @@ int arch_stack_walk_reliable(stack_trace_consume_fn 
consume_entry,
if (unwind_error())
return -EINVAL;
 
-   /* Success path for non-user tasks, i.e. kthreads and idle tasks */
-   if (!(task->flags & (PF_KTHREAD | PF_IDLE)))
-   return -EINVAL;
-
return 0;
 }
 
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 7f969b2d240f..d7396431261a 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -540,7 +540,7 @@ bool unwind_next_frame(struct unwind_state *state)
state->sp = sp;
state->regs = NULL;
state->prev_regs = NULL;
-   state->signal = false;
+   state->signal = ((void *)state->ip == ret_from_fork);
break;
 
case ORC_TYPE_REGS:



Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-01 Thread Wangshaobo (bobo)



在 2020/6/2 2:05, Josh Poimboeuf 写道:

On Sat, May 30, 2020 at 10:21:19AM +0800, Wangshaobo (bobo) wrote:

1) when a user mode task just fork start excuting ret_from_fork() till
schedule_tail, unwind_next_frame found

orc->sp_reg is ORC_REG_UNDEFINED but orc->end not equals zero, this time
arch_stack_walk_reliable()

terminates it's backtracing loop for unwind_done() return true. then 'if
(!(task->flags & (PF_KTHREAD | PF_IDLE)))'

in arch_stack_walk_reliable() true and return -EINVAL after.

* The stack trace looks like that:

ret_from_fork

   -=> UNWIND_HINT_EMPTY

   -=> schedule_tail /* schedule out */

   ...

   -=> UNWIND_HINT_REGS  /*  UNDO */

Yes, makes sense.


2) when using call_usermodehelper_exec_async() to create a user mode task,
ret_from_fork() still not exec whereas

the task has been scheduled in __schedule(), at this time, orc->sp_reg is
ORC_REG_UNDEFINED but orc->end equals zero,

unwind_error() return true and also terminates arch_stack_walk_reliable()'s
backtracing loop, end up return from

'if (unwind_error())' branch.

* The stack trace looks like that:

-=> call_usermodehelper_exec

                  -=> do_exec

    -=> search_binary_handler

   -=> load_elf_binary

     -=> elf_map

  -=> vm_mmap_pgoff

-=> down_write_killable

-=> _cond_resched

  -=> __schedule   /* scheduled to work */

-=> ret_from_fork   /* UNDO */

I don't quite follow the stacktrace, but it sounds like the issue is the
same as the first one you originally reported:


yes, true, same as the first one,  the only difference what i want to 
say is the task has been scheduled but the first one is not.



1) The task was not actually scheduled to excute, at this time
UNWIND_HINT_EMPTY in ret_from_fork() has not reset unwind_hint, it's
sp_reg and end field remain default value and end up throwing an error
in unwind_next_frame() when called by arch_stack_walk_reliable();

Or am I misunderstanding?

And to reiterate, these are not "livepatch failures", right?  Livepatch
doesn't fail when stack_trace_save_tsk_reliable() returns an error.  It
recovers gracefully and tries again later.


yes, you are right,  "livepatch failures" only indicates serveral retry 
failures, we found if frequent fork() happend in current


system, it is easier to cause retry but still always end up success.

so i think this question is related to ORC unwinder, could i ask if you 
have strategy or plan to avoid this problem ?


thanks,

Wang ShaoBo




Re: Question: livepatch failed for new fork() task stack unreliable

2020-06-01 Thread Josh Poimboeuf
On Sat, May 30, 2020 at 10:21:19AM +0800, Wangshaobo (bobo) wrote:
> 1) when a user mode task just fork start excuting ret_from_fork() till
> schedule_tail, unwind_next_frame found
> 
> orc->sp_reg is ORC_REG_UNDEFINED but orc->end not equals zero, this time
> arch_stack_walk_reliable()
> 
> terminates it's backtracing loop for unwind_done() return true. then 'if
> (!(task->flags & (PF_KTHREAD | PF_IDLE)))'
> 
> in arch_stack_walk_reliable() true and return -EINVAL after.
> 
> * The stack trace looks like that:
> 
> ret_from_fork
> 
>   -=> UNWIND_HINT_EMPTY
> 
>   -=> schedule_tail /* schedule out */
> 
>   ...
> 
>   -=> UNWIND_HINT_REGS  /*  UNDO */

Yes, makes sense.

> 2) when using call_usermodehelper_exec_async() to create a user mode task,
> ret_from_fork() still not exec whereas
> 
> the task has been scheduled in __schedule(), at this time, orc->sp_reg is
> ORC_REG_UNDEFINED but orc->end equals zero,
> 
> unwind_error() return true and also terminates arch_stack_walk_reliable()'s
> backtracing loop, end up return from
> 
> 'if (unwind_error())' branch.
> 
> * The stack trace looks like that:
> 
> -=> call_usermodehelper_exec
> 
>                  -=> do_exec
> 
>    -=> search_binary_handler
> 
>   -=> load_elf_binary
> 
>     -=> elf_map
> 
>  -=> vm_mmap_pgoff
> 
> -=> down_write_killable
> 
> -=> _cond_resched
> 
>  -=> __schedule   /* scheduled to work */
> 
> -=> ret_from_fork   /* UNDO */

I don't quite follow the stacktrace, but it sounds like the issue is the
same as the first one you originally reported:

> 1) The task was not actually scheduled to excute, at this time
> UNWIND_HINT_EMPTY in ret_from_fork() has not reset unwind_hint, it's
> sp_reg and end field remain default value and end up throwing an error
> in unwind_next_frame() when called by arch_stack_walk_reliable();

Or am I misunderstanding?

And to reiterate, these are not "livepatch failures", right?  Livepatch
doesn't fail when stack_trace_save_tsk_reliable() returns an error.  It
recovers gracefully and tries again later.

-- 
Josh



Re: Question: livepatch failed for new fork() task stack unreliable

2020-05-29 Thread Josh Poimboeuf
On Fri, May 29, 2020 at 06:10:59PM +0800, Wang ShaoBo wrote:
> Stack unreliable error is reported by stack_trace_save_tsk_reliable() when 
> trying
> to insmod a hot patch for module modification, this results in frequent 
> failures
> sometimes. We found this 'unreliable' stack is from task just fork.

For livepatch, this shouldn't actually be a failure.  The patch will
just stay in the transition state until after the fork has completed.
Which should happen in a reasonable amount of time, right?

> 1) The task was not actually scheduled to excute, at this time 
> UNWIND_HINT_EMPTY in
> ret_from_fork() has not reset unwind_hint, it's sp_reg and end field remain 
> default value
> and end up throwing an error in unwind_next_frame() when called by 
> arch_stack_walk_reliable();

Yes, this seems to be true for forked-but-not-yet-scheduled tasks.

I can look at fixing that.  I have some ORC cleanups in progress which
are related to UNWIND_HINT_EMPTY and the end of the stack.  I can add
this issue to the list of improvements.

> 2) The task has been scheduled but UNWIND_HINT_REGS not finished, at this time
> arch_stack_walk_reliable() terminates it's backtracing loop for pt_regs 
> unknown
> and return -EINVAL because it's a user task.

Hm, do you see this problem with upstream?  It seems like it should
work.  arch_stack_walk_reliable() has this:

/* Success path for user tasks */
if (user_mode(regs))
return 0;

Where exactly is the error coming from?

-- 
Josh