Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

2019-06-19 Thread Naveen N. Rao

Nicholas Piggin wrote:

Naveen N. Rao's on June 19, 2019 7:53 pm:

Nicholas Piggin wrote:

Michael Ellerman's on June 19, 2019 3:14 pm:


I'm also not convinced the ordering between the two patches is
guaranteed by the ISA, given that there's possibly no isync on the other
CPU.


Will they go through a context synchronizing event?

synchronize_rcu_tasks() should ensure a thread is scheduled away, but
I'm not actually sure it guarantees CSI if it's kernel->kernel. Could
do a smp_call_function to do the isync on each CPU to be sure.


Good point. Per 
Documentation/RCU/Design/Requirements/Requirements.html#Tasks RCU:
"The solution, in the form of Tasks RCU, is to have implicit read-side 
critical sections that are delimited by voluntary context switches, that 
is, calls to schedule(), cond_resched(), and synchronize_rcu_tasks(). In 
addition, transitions to and from userspace execution also delimit 
tasks-RCU read-side critical sections."


I suppose transitions to/from userspace, as well as calls to schedule() 
result in context synchronizing instruction being executed. But, if some 
tasks call cond_resched() and synchronize_rcu_tasks(), we probably won't 
have a CSI executed.


Also:
"In CONFIG_PREEMPT=n kernels, trampolines cannot be preempted, so these 
APIs map to call_rcu(), synchronize_rcu(), and rcu_barrier(), 
respectively."


In this scenario as well, I think we won't have a CSI executed in case 
of cond_resched().


Should we enhance patch_instruction() to handle that?


Well, not sure. Do we have many post-boot callers of it? Should
they take care of their own synchronization requirements?


Kprobes and ftrace are the two users (along with anything else that may 
use jump labels).


Looking at this from the CMODX perspective: the main example quoted of 
an erratic behavior is when any variant of the patched instruction 
causes an exception.


With ftrace, I think we are ok since we only ever patch a 'nop' or a 
'bl' (and the 'mflr' now), none of which should cause an exception. As 
such, the existing patch_instruction() should suffice.


However, with kprobes, we patch a 'trap' (or a branch in case of 
optprobes) on most instructions. I wonder if we should be issuing an 
'isync' on all cpus in this case. Or, even if that is sufficient or 
necessary.



Thanks,
Naveen




Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

2019-06-19 Thread Nicholas Piggin
Naveen N. Rao's on June 19, 2019 7:53 pm:
> Nicholas Piggin wrote:
>> Michael Ellerman's on June 19, 2019 3:14 pm:
>>> Hi Naveen,
>>> 
>>> Sorry I meant to reply to this earlier .. :/
> 
> No problem. Thanks for the questions.
> 
>>> 
>>> "Naveen N. Rao"  writes:
 With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
 enable function tracing and profiling. So far, with dynamic ftrace, we
 used to only patch out the branch to _mcount(). However, mflr is
 executed by the branch unit that can only execute one per cycle on
 POWER9 and shared with branches, so it would be nice to avoid it where
 possible.

 We cannot simply nop out the mflr either. When enabling function
 tracing, there can be a race if tracing is enabled when some thread was
 interrupted after executing a nop'ed out mflr. In this case, the thread
 would execute the now-patched-in branch to _mcount() without having
 executed the preceding mflr.

 To solve this, we now enable function tracing in 2 steps: patch in the
 mflr instruction, use synchronize_rcu_tasks() to ensure all existing
 threads make progress, and then patch in the branch to _mcount(). We
 override ftrace_replace_code() with a powerpc64 variant for this
 purpose.
>>> 
>>> According to the ISA we're not allowed to patch mflr at runtime. See the
>>> section on "CMODX".
>> 
>> According to "quasi patch class" engineering note, we can patch
>> anything with a preferred nop. But that's written as an optional
>> facility, which we don't have a feature to test for.
>> 
> 
> Hmm... I wonder what the implications are. We've been patching in a 
> 'trap' for kprobes for a long time now, along with having to patch back 
> the original instruction (which can be anything), when the probe is 
> removed.

Will have to check what implementations support "quasi patch class"
instructions. IIRC recent POWER processors are okay. May have to add
a feature test though.

>>> 
>>> I'm also not convinced the ordering between the two patches is
>>> guaranteed by the ISA, given that there's possibly no isync on the other
>>> CPU.
>> 
>> Will they go through a context synchronizing event?
>> 
>> synchronize_rcu_tasks() should ensure a thread is scheduled away, but
>> I'm not actually sure it guarantees CSI if it's kernel->kernel. Could
>> do a smp_call_function to do the isync on each CPU to be sure.
> 
> Good point. Per 
> Documentation/RCU/Design/Requirements/Requirements.html#Tasks RCU:
> "The solution, in the form of Tasks RCU, is to have implicit read-side 
> critical sections that are delimited by voluntary context switches, that 
> is, calls to schedule(), cond_resched(), and synchronize_rcu_tasks(). In 
> addition, transitions to and from userspace execution also delimit 
> tasks-RCU read-side critical sections."
> 
> I suppose transitions to/from userspace, as well as calls to schedule() 
> result in context synchronizing instruction being executed. But, if some 
> tasks call cond_resched() and synchronize_rcu_tasks(), we probably won't 
> have a CSI executed.
> 
> Also:
> "In CONFIG_PREEMPT=n kernels, trampolines cannot be preempted, so these 
> APIs map to call_rcu(), synchronize_rcu(), and rcu_barrier(), 
> respectively."
> 
> In this scenario as well, I think we won't have a CSI executed in case 
> of cond_resched().
> 
> Should we enhance patch_instruction() to handle that?

Well, not sure. Do we have many post-boot callers of it? Should
they take care of their own synchronization requirements?

Thanks,
Nick


Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

2019-06-19 Thread Naveen N. Rao

Nicholas Piggin wrote:

Michael Ellerman's on June 19, 2019 3:14 pm:

Hi Naveen,

Sorry I meant to reply to this earlier .. :/


No problem. Thanks for the questions.



"Naveen N. Rao"  writes:

With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
enable function tracing and profiling. So far, with dynamic ftrace, we
used to only patch out the branch to _mcount(). However, mflr is
executed by the branch unit that can only execute one per cycle on
POWER9 and shared with branches, so it would be nice to avoid it where
possible.

We cannot simply nop out the mflr either. When enabling function
tracing, there can be a race if tracing is enabled when some thread was
interrupted after executing a nop'ed out mflr. In this case, the thread
would execute the now-patched-in branch to _mcount() without having
executed the preceding mflr.

To solve this, we now enable function tracing in 2 steps: patch in the
mflr instruction, use synchronize_rcu_tasks() to ensure all existing
threads make progress, and then patch in the branch to _mcount(). We
override ftrace_replace_code() with a powerpc64 variant for this
purpose.


According to the ISA we're not allowed to patch mflr at runtime. See the
section on "CMODX".


According to "quasi patch class" engineering note, we can patch
anything with a preferred nop. But that's written as an optional
facility, which we don't have a feature to test for.



Hmm... I wonder what the implications are. We've been patching in a 
'trap' for kprobes for a long time now, along with having to patch back 
the original instruction (which can be anything), when the probe is 
removed.




I'm also not convinced the ordering between the two patches is
guaranteed by the ISA, given that there's possibly no isync on the other
CPU.


Will they go through a context synchronizing event?

synchronize_rcu_tasks() should ensure a thread is scheduled away, but
I'm not actually sure it guarantees CSI if it's kernel->kernel. Could
do a smp_call_function to do the isync on each CPU to be sure.


Good point. Per 
Documentation/RCU/Design/Requirements/Requirements.html#Tasks RCU:
"The solution, in the form of Tasks RCU, is to have implicit read-side 
critical sections that are delimited by voluntary context switches, that 
is, calls to schedule(), cond_resched(), and synchronize_rcu_tasks(). In 
addition, transitions to and from userspace execution also delimit 
tasks-RCU read-side critical sections."


I suppose transitions to/from userspace, as well as calls to schedule() 
result in context synchronizing instruction being executed. But, if some 
tasks call cond_resched() and synchronize_rcu_tasks(), we probably won't 
have a CSI executed.


Also:
"In CONFIG_PREEMPT=n kernels, trampolines cannot be preempted, so these 
APIs map to call_rcu(), synchronize_rcu(), and rcu_barrier(), 
respectively."


In this scenario as well, I think we won't have a CSI executed in case 
of cond_resched().


Should we enhance patch_instruction() to handle that?


- Naveen



Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

2019-06-19 Thread Nicholas Piggin
Michael Ellerman's on June 19, 2019 3:14 pm:
> Hi Naveen,
> 
> Sorry I meant to reply to this earlier .. :/
> 
> "Naveen N. Rao"  writes:
>> With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
>> enable function tracing and profiling. So far, with dynamic ftrace, we
>> used to only patch out the branch to _mcount(). However, mflr is
>> executed by the branch unit that can only execute one per cycle on
>> POWER9 and shared with branches, so it would be nice to avoid it where
>> possible.
>>
>> We cannot simply nop out the mflr either. When enabling function
>> tracing, there can be a race if tracing is enabled when some thread was
>> interrupted after executing a nop'ed out mflr. In this case, the thread
>> would execute the now-patched-in branch to _mcount() without having
>> executed the preceding mflr.
>>
>> To solve this, we now enable function tracing in 2 steps: patch in the
>> mflr instruction, use synchronize_rcu_tasks() to ensure all existing
>> threads make progress, and then patch in the branch to _mcount(). We
>> override ftrace_replace_code() with a powerpc64 variant for this
>> purpose.
> 
> According to the ISA we're not allowed to patch mflr at runtime. See the
> section on "CMODX".

According to "quasi patch class" engineering note, we can patch
anything with a preferred nop. But that's written as an optional
facility, which we don't have a feature to test for.

> 
> I'm also not convinced the ordering between the two patches is
> guaranteed by the ISA, given that there's possibly no isync on the other
> CPU.

Will they go through a context synchronizing event?

synchronize_rcu_tasks() should ensure a thread is scheduled away, but
I'm not actually sure it guarantees CSI if it's kernel->kernel. Could
do a smp_call_function to do the isync on each CPU to be sure.

Thanks,
Nick



Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

2019-06-18 Thread Michael Ellerman
Hi Naveen,

Sorry I meant to reply to this earlier .. :/

"Naveen N. Rao"  writes:
> With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
> enable function tracing and profiling. So far, with dynamic ftrace, we
> used to only patch out the branch to _mcount(). However, mflr is
> executed by the branch unit that can only execute one per cycle on
> POWER9 and shared with branches, so it would be nice to avoid it where
> possible.
>
> We cannot simply nop out the mflr either. When enabling function
> tracing, there can be a race if tracing is enabled when some thread was
> interrupted after executing a nop'ed out mflr. In this case, the thread
> would execute the now-patched-in branch to _mcount() without having
> executed the preceding mflr.
>
> To solve this, we now enable function tracing in 2 steps: patch in the
> mflr instruction, use synchronize_rcu_tasks() to ensure all existing
> threads make progress, and then patch in the branch to _mcount(). We
> override ftrace_replace_code() with a powerpc64 variant for this
> purpose.

According to the ISA we're not allowed to patch mflr at runtime. See the
section on "CMODX".

I'm also not convinced the ordering between the two patches is
guaranteed by the ISA, given that there's possibly no isync on the other
CPU.

But I haven't had time to dig into it sorry, hopefully later in the
week?

cheers

> diff --git a/arch/powerpc/kernel/trace/ftrace.c 
> b/arch/powerpc/kernel/trace/ftrace.c
> index 517662a56bdc..5e2b29808af1 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -125,7 +125,7 @@ __ftrace_make_nop(struct module *mod,
>  {
>   unsigned long entry, ptr, tramp;
>   unsigned long ip = rec->ip;
> - unsigned int op, pop;
> + unsigned int op;
>  
>   /* read where this goes */
>   if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
> @@ -160,8 +160,6 @@ __ftrace_make_nop(struct module *mod,
>  
>  #ifdef CONFIG_MPROFILE_KERNEL
>   /* When using -mkernel_profile there is no load to jump over */
> - pop = PPC_INST_NOP;
> -
>   if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
>   pr_err("Fetching instruction at %lx failed.\n", ip - 4);
>   return -EFAULT;
> @@ -169,26 +167,23 @@ __ftrace_make_nop(struct module *mod,
>  
>   /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
>   if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
> - pr_err("Unexpected instruction %08x around bl _mcount\n", op);
> + pr_err("Unexpected instruction %08x before bl _mcount\n", op);
>   return -EINVAL;
>   }
> -#else
> - /*
> -  * Our original call site looks like:
> -  *
> -  * bl 
> -  * ld r2,XX(r1)
> -  *
> -  * Milton Miller pointed out that we can not simply nop the branch.
> -  * If a task was preempted when calling a trace function, the nops
> -  * will remove the way to restore the TOC in r2 and the r2 TOC will
> -  * get corrupted.
> -  *
> -  * Use a b +8 to jump over the load.
> -  */
>  
> - pop = PPC_INST_BRANCH | 8;  /* b +8 */
> + /* We should patch out the bl to _mcount first */
> + if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) {
> + pr_err("Patching NOP failed.\n");
> + return -EPERM;
> + }
>  
> + /* then, nop out the preceding 'mflr r0' as an optimization */
> + if (op == PPC_INST_MFLR &&
> + patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
> + pr_err("Patching NOP failed.\n");
> + return -EPERM;
> + }
> +#else
>   /*
>* Check what is in the next instruction. We can see ld r2,40(r1), but
>* on first pass after boot we will see mflr r0.
> @@ -202,12 +197,25 @@ __ftrace_make_nop(struct module *mod,
>   pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op);
>   return -EINVAL;
>   }
> -#endif /* CONFIG_MPROFILE_KERNEL */
>  
> - if (patch_instruction((unsigned int *)ip, pop)) {
> + /*
> +  * Our original call site looks like:
> +  *
> +  * bl 
> +  * ld r2,XX(r1)
> +  *
> +  * Milton Miller pointed out that we can not simply nop the branch.
> +  * If a task was preempted when calling a trace function, the nops
> +  * will remove the way to restore the TOC in r2 and the r2 TOC will
> +  * get corrupted.
> +  *
> +  * Use a b +8 to jump over the load.
> +  */
> + if (patch_instruction((unsigned int *)ip, PPC_INST_BRANCH | 8)) {
>   pr_err("Patching NOP failed.\n");
>   return -EPERM;
>   }
> +#endif /* CONFIG_MPROFILE_KERNEL */
>  
>   return 0;
>  }
> @@ -421,6 +429,26 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace 
> *rec, unsigned long addr)
>   return -EPERM;
>   }
>  
> +#ifdef CONFIG_MPROFILE_KERNEL
> + /* Nop out the preceding 'mflr r0' as an optim

[PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

2019-06-18 Thread Naveen N. Rao
With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
enable function tracing and profiling. So far, with dynamic ftrace, we
used to only patch out the branch to _mcount(). However, mflr is
executed by the branch unit that can only execute one per cycle on
POWER9 and shared with branches, so it would be nice to avoid it where
possible.

We cannot simply nop out the mflr either. When enabling function
tracing, there can be a race if tracing is enabled when some thread was
interrupted after executing a nop'ed out mflr. In this case, the thread
would execute the now-patched-in branch to _mcount() without having
executed the preceding mflr.

To solve this, we now enable function tracing in 2 steps: patch in the
mflr instruction, use synchronize_rcu_tasks() to ensure all existing
threads make progress, and then patch in the branch to _mcount(). We
override ftrace_replace_code() with a powerpc64 variant for this
purpose.

Suggested-by: Nicholas Piggin 
Reviewed-by: Nicholas Piggin 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 241 ++---
 1 file changed, 219 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 517662a56bdc..5e2b29808af1 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -125,7 +125,7 @@ __ftrace_make_nop(struct module *mod,
 {
unsigned long entry, ptr, tramp;
unsigned long ip = rec->ip;
-   unsigned int op, pop;
+   unsigned int op;
 
/* read where this goes */
if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
@@ -160,8 +160,6 @@ __ftrace_make_nop(struct module *mod,
 
 #ifdef CONFIG_MPROFILE_KERNEL
/* When using -mkernel_profile there is no load to jump over */
-   pop = PPC_INST_NOP;
-
if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
pr_err("Fetching instruction at %lx failed.\n", ip - 4);
return -EFAULT;
@@ -169,26 +167,23 @@ __ftrace_make_nop(struct module *mod,
 
/* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
-   pr_err("Unexpected instruction %08x around bl _mcount\n", op);
+   pr_err("Unexpected instruction %08x before bl _mcount\n", op);
return -EINVAL;
}
-#else
-   /*
-* Our original call site looks like:
-*
-* bl 
-* ld r2,XX(r1)
-*
-* Milton Miller pointed out that we can not simply nop the branch.
-* If a task was preempted when calling a trace function, the nops
-* will remove the way to restore the TOC in r2 and the r2 TOC will
-* get corrupted.
-*
-* Use a b +8 to jump over the load.
-*/
 
-   pop = PPC_INST_BRANCH | 8;  /* b +8 */
+   /* We should patch out the bl to _mcount first */
+   if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) {
+   pr_err("Patching NOP failed.\n");
+   return -EPERM;
+   }
 
+   /* then, nop out the preceding 'mflr r0' as an optimization */
+   if (op == PPC_INST_MFLR &&
+   patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
+   pr_err("Patching NOP failed.\n");
+   return -EPERM;
+   }
+#else
/*
 * Check what is in the next instruction. We can see ld r2,40(r1), but
 * on first pass after boot we will see mflr r0.
@@ -202,12 +197,25 @@ __ftrace_make_nop(struct module *mod,
pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op);
return -EINVAL;
}
-#endif /* CONFIG_MPROFILE_KERNEL */
 
-   if (patch_instruction((unsigned int *)ip, pop)) {
+   /*
+* Our original call site looks like:
+*
+* bl 
+* ld r2,XX(r1)
+*
+* Milton Miller pointed out that we can not simply nop the branch.
+* If a task was preempted when calling a trace function, the nops
+* will remove the way to restore the TOC in r2 and the r2 TOC will
+* get corrupted.
+*
+* Use a b +8 to jump over the load.
+*/
+   if (patch_instruction((unsigned int *)ip, PPC_INST_BRANCH | 8)) {
pr_err("Patching NOP failed.\n");
return -EPERM;
}
+#endif /* CONFIG_MPROFILE_KERNEL */
 
return 0;
 }
@@ -421,6 +429,26 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace 
*rec, unsigned long addr)
return -EPERM;
}
 
+#ifdef CONFIG_MPROFILE_KERNEL
+   /* Nop out the preceding 'mflr r0' as an optimization */
+   if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
+   pr_err("Fetching instruction at %lx failed.\n", ip - 4);
+   return -EFAULT;
+   }
+
+   /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
+   if (op != P