Re: [PATCHv4] arm: ftrace: Adds support for CONFIG_DYNAMIC_FTRACE_WITH_REGS

2017-03-02 Thread Russell King - ARM Linux
On Thu, Mar 02, 2017 at 09:01:24PM +, Abel Vesa wrote:
> On Tue, Feb 28, 2017 at 11:54:29AM +, Abel Vesa wrote:
> > On Tue, Feb 28, 2017 at 11:46:38AM +, Russell King - ARM Linux wrote:
> > > On Tue, Feb 28, 2017 at 11:22:27AM +, Abel Vesa wrote:
> > > > On Tue, Feb 28, 2017 at 11:58:49AM +0100, Nicolai Stange wrote:
> > > > > Hi Abel,
> > > > > 
> > > > > On Tue, Feb 28 2017, Abel Vesa wrote:
> > > > > 
> > > > > > On Mon, Feb 27, 2017 at 04:52:06PM +0100, Nicolai Stange wrote:
> > > > > >> On Fri, Feb 24 2017, Abel Vesa wrote:
> > > > > >> Wouldn't it be better (and more consistent with other archs) to 
> > > > > >> have
> > > > > >> 
> > > > > >>   pt_regs->ARM_lr = original lr
> > > > > >>   pt_refs->ARM_pc = current lr
> > > > > >> 
> > > > > >> instead?
> > > > > 
> > > > > The stack would look like this then
> > > > > 
> > > > > @ ...   | ARM_ip | ARM_sp | ARM_lr  | ARM_pc  | ...   
> > > > >|
> > > > > @ 0  4  48   52   566064
> > > > > 68   72
> > > > > @ R0 | R1 | ... | LR | SP + 4 | original LR | original PC | PSR | 
> > > > > OLD_R0 | original LR |
> Just to make sure we're on the same page. If we are replacing the LR 
> with the original_LR is it worth keeping around the one pushed before 
> the ftrace_regs_caller is called?
> 
> Another thing, PC needs to be new_LR and then we can restore all 
> regs r0 through r15 like this:
> 
>   ldmia   sp, {r0-r15}

That's the intention - the point is to save the state as it was at the
point that the function was entered, not at the point when the ftrace
code was entered.  What we don't want is the implementation details of
GCC's mcount or ftrace's adaption of mcount being exposed via ftrace.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCHv4] arm: ftrace: Adds support for CONFIG_DYNAMIC_FTRACE_WITH_REGS

2017-03-02 Thread Abel Vesa
On Tue, Feb 28, 2017 at 11:54:29AM +, Abel Vesa wrote:
> On Tue, Feb 28, 2017 at 11:46:38AM +, Russell King - ARM Linux wrote:
> > On Tue, Feb 28, 2017 at 11:22:27AM +, Abel Vesa wrote:
> > > On Tue, Feb 28, 2017 at 11:58:49AM +0100, Nicolai Stange wrote:
> > > > Hi Abel,
> > > > 
> > > > On Tue, Feb 28 2017, Abel Vesa wrote:
> > > > 
> > > > > On Mon, Feb 27, 2017 at 04:52:06PM +0100, Nicolai Stange wrote:
> > > > >> On Fri, Feb 24 2017, Abel Vesa wrote:
> > > > >> Wouldn't it be better (and more consistent with other archs) to have
> > > > >> 
> > > > >>   pt_regs->ARM_lr = original lr
> > > > >>   pt_refs->ARM_pc = current lr
> > > > >> 
> > > > >> instead?
> > > > 
> > > > The stack would look like this then
> > > > 
> > > > @ ...   | ARM_ip | ARM_sp | ARM_lr  | ARM_pc  | ... 
> > > >  |
> > > > @ 0  4  48   52   56606468  
> > > >  72
> > > > @ R0 | R1 | ... | LR | SP + 4 | original LR | original PC | PSR | 
> > > > OLD_R0 | original LR |
Just to make sure we're on the same page. If we are replacing the LR 
with the original_LR is it worth keeping around the one pushed before 
the ftrace_regs_caller is called?

Another thing, PC needs to be new_LR and then we can restore all 
regs r0 through r15 like this:

ldmia   sp, {r0-r15}
> > > > 
> > > > I.e. the pt_regs would capture almost the full context of the
> > > > instrumented function (except for ip).
> > > > 
> > > So basicly what you are saying is:
> > > - instead of current LR save original LR (previous one saved in 
> > > instrumented function epilog)
> > > - instead of current PC save original PC (previous one saved in 
> > > instrumented function epilog)
> > > 
> > > I still don't see the point of saving the actual value of PC since nobody 
> > > will ever
> > > restore it. In case of livepatch it will get overwritten anyway. As for 
> > > LR, I agree,
> > > it could be the original one in pt_regs.
> > > 
> > > I'll look into this sometime today or tomorrow and get back with updates.
> > 
> > Which is exactly what I proposed, with code, on one of the previous
> > iterations of this patch...
> Fair enough. I probably missunderstood your comments then.
> 
> Thanks.
> > 
> > -- 
> > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> > according to speedtest.net.


Re: [PATCHv4] arm: ftrace: Adds support for CONFIG_DYNAMIC_FTRACE_WITH_REGS

2017-02-28 Thread Russell King - ARM Linux
On Tue, Feb 28, 2017 at 11:22:27AM +, Abel Vesa wrote:
> On Tue, Feb 28, 2017 at 11:58:49AM +0100, Nicolai Stange wrote:
> > Hi Abel,
> > 
> > On Tue, Feb 28 2017, Abel Vesa wrote:
> > 
> > > On Mon, Feb 27, 2017 at 04:52:06PM +0100, Nicolai Stange wrote:
> > >> On Fri, Feb 24 2017, Abel Vesa wrote:
> > >> Wouldn't it be better (and more consistent with other archs) to have
> > >> 
> > >>   pt_regs->ARM_lr = original lr
> > >>   pt_refs->ARM_pc = current lr
> > >> 
> > >> instead?
> > 
> > The stack would look like this then
> > 
> > @ ...   | ARM_ip | ARM_sp | ARM_lr  | ARM_pc  | ... 
> >  |
> > @ 0  4  48   52   56606468  
> >  72
> > @ R0 | R1 | ... | LR | SP + 4 | original LR | original PC | PSR | 
> > OLD_R0 | original LR |
> > 
> > I.e. the pt_regs would capture almost the full context of the
> > instrumented function (except for ip).
> > 
> So basicly what you are saying is:
> - instead of current LR save original LR (previous one saved in instrumented 
> function epilog)
> - instead of current PC save original PC (previous one saved in instrumented 
> function epilog)
> 
> I still don't see the point of saving the actual value of PC since nobody 
> will ever
> restore it. In case of livepatch it will get overwritten anyway. As for LR, I 
> agree,
> it could be the original one in pt_regs.
> 
> I'll look into this sometime today or tomorrow and get back with updates.

Which is exactly what I proposed, with code, on one of the previous
iterations of this patch...

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCHv4] arm: ftrace: Adds support for CONFIG_DYNAMIC_FTRACE_WITH_REGS

2017-02-28 Thread Abel Vesa
On Tue, Feb 28, 2017 at 11:46:38AM +, Russell King - ARM Linux wrote:
> On Tue, Feb 28, 2017 at 11:22:27AM +, Abel Vesa wrote:
> > On Tue, Feb 28, 2017 at 11:58:49AM +0100, Nicolai Stange wrote:
> > > Hi Abel,
> > > 
> > > On Tue, Feb 28 2017, Abel Vesa wrote:
> > > 
> > > > On Mon, Feb 27, 2017 at 04:52:06PM +0100, Nicolai Stange wrote:
> > > >> On Fri, Feb 24 2017, Abel Vesa wrote:
> > > >> Wouldn't it be better (and more consistent with other archs) to have
> > > >> 
> > > >>   pt_regs->ARM_lr = original lr
> > > >>   pt_refs->ARM_pc = current lr
> > > >> 
> > > >> instead?
> > > 
> > > The stack would look like this then
> > > 
> > > @ ...   | ARM_ip | ARM_sp | ARM_lr  | ARM_pc  | ...   
> > >|
> > > @ 0  4  48   52   56606468
> > >72
> > > @ R0 | R1 | ... | LR | SP + 4 | original LR | original PC | PSR | 
> > > OLD_R0 | original LR |
> > > 
> > > I.e. the pt_regs would capture almost the full context of the
> > > instrumented function (except for ip).
> > > 
> > So basicly what you are saying is:
> > - instead of current LR save original LR (previous one saved in 
> > instrumented function epilog)
> > - instead of current PC save original PC (previous one saved in 
> > instrumented function epilog)
> > 
> > I still don't see the point of saving the actual value of PC since nobody 
> > will ever
> > restore it. In case of livepatch it will get overwritten anyway. As for LR, 
> > I agree,
> > it could be the original one in pt_regs.
> > 
> > I'll look into this sometime today or tomorrow and get back with updates.
> 
> Which is exactly what I proposed, with code, on one of the previous
> iterations of this patch...
Fair enough. I probably missunderstood your comments then.

Thanks.
> 
> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.


Re: [PATCHv4] arm: ftrace: Adds support for CONFIG_DYNAMIC_FTRACE_WITH_REGS

2017-02-28 Thread Abel Vesa
On Tue, Feb 28, 2017 at 11:58:49AM +0100, Nicolai Stange wrote:
> Hi Abel,
> 
> On Tue, Feb 28 2017, Abel Vesa wrote:
> 
> > On Mon, Feb 27, 2017 at 04:52:06PM +0100, Nicolai Stange wrote:
> >> On Fri, Feb 24 2017, Abel Vesa wrote:
> >> 
> >> 
> >> 
> >> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> >> > index fda6a46..877df5b 100644
> >> > --- a/arch/arm/Kconfig
> >> > +++ b/arch/arm/Kconfig
> >> > @@ -55,6 +55,7 @@ config ARM
> >> >  select HAVE_DMA_API_DEBUG
> >> >  select HAVE_DMA_CONTIGUOUS if MMU
> >> >  select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 
> >> > && MMU
> >> > +select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE && 
> >> > OLD_MCOUNT
> >> 
> >> 
> >> AFAICS, your code depends on the __gnu_mcount_nc calling conventions,
> >> i.e. on gcc pushing the original lr on the stack. In particular, there's
> >> no implementation of a ftrace_regs_caller_old or so.
> >> 
> >> So shouldn't this read as !OLD_MCOUNT instead?
> > The implementation of __ftrace_modify_code which sets the kernel text to rw
> > needs OLD_MCOUNT (that is, the arch specific one, not the generic one).
> 
> You're right that ARM's implementation of __ftrace_modify_code() is hidden
> within an #ifdef CONFIG_OLD_MCOUNT.
> 
> But,
>  - its implementation doesn't "need" or depend on OLD_MCOUNT
>  - and it's true in general that the kernel text must be made writable
>before ftrace_modify_all_code() attempts to write to it.
> 
> So, I bet that the set_kernel_text_rw()-calling ARM implementations of
> arch_ftrace_update_code() and __ftrace_modify_code() resp. have been
> inserted under that CONFIG_OLD_MCOUNT #ifdef by mistake with commit
> 80d6b0c2eed2 ("ARM: mm: allow text and rodata sections to be
> read-only").
> 
> In conclusion, I claim that DYNAMIC_FTRACE w/o OLD_MCOUNT had been
> broken before your patch already. I didn't explicitly test that though.
>
That is correct. The DYNAMIC_FTRACE w/o OLD_MCOUNT doesn't work. 
> I think that should be fixed rather than your DYNAMIC_FTRACE_WITH_REGS
> pulling in OLD_MCOUNT in order to repair DYNAMIC_FTRACE.
> 
> Especially since your implementation seems to require !OLD_MCOUNT...
> 
So making arch specific __ftrace_modify_code to be OLD_MCOUNT independent might 
fix 
DYNAMIC_FTRACE w/o OLD_MCOUNT and implicitly make DYNAMIC_FTRACE_WITH_REGS not 
dependent on
OLD_MCOUNT.

I will investigate this further today. Probably the whole dependancy between 
FUNCTION_TRACER
and OLD_MCOUNT will need to be changed/updated.
> >> Also, at least the ldmia ..., {..., sp, ...} insn needs a !THUMB2_KERNEL.
> >> 
> >> 
> >> >  select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || 
> >> > CPU_V7) && MMU
> >> >  select HAVE_EXIT_THREAD
> >> >  select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL)
> >> 
> >> 
> >> 
> >> > diff --git a/arch/arm/kernel/entry-ftrace.S 
> >> > b/arch/arm/kernel/entry-ftrace.S
> >> > index c73c403..3916dd6 100644
> >> > --- a/arch/arm/kernel/entry-ftrace.S
> >> > +++ b/arch/arm/kernel/entry-ftrace.S
> >> > @@ -92,12 +92,78 @@
> >> >  2:  mcount_exit
> >> >  .endm
> >> >  
> >> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> >> > +
> >> > +.macro __ftrace_regs_caller
> >> > +
> >> > +sub sp, sp, #8  @ space for CPSR and OLD_R0 (not used)
> >> > +
> >> > +add ip, sp, #12 @ move in IP the value of SP as it was
> >> > +@ before the push {lr} of the mcount 
> >> > mechanism
> >> > +stmdb   sp!, {ip,lr,pc}
> >> > +stmdb   sp!, {r0-r11,lr}
> >> > +
> >> > +@ stack content at this point:
> >> > +@ 0  4  48   52   56   60   6468   72
> >> > +@ R0 | R1 | ... | LR | SP + 4 | LR | PC | PSR | OLD_R0 | 
> >> > previous LR |
> >> 
> >> Being a constant, the saved pc is not very useful, I think.
> > So you're saying skip it ? But you still need to leave space for it. So why 
> > not
> > just save it even if the value is not useful?
> 
> No, no, I don't want to skip it. I'd just prefer to have the pt_regs'
> ->ARM_lr and ->ARM_pc slots filled with different, perhaps more useful
> values:
> 
> >> 
> >> Wouldn't it be better (and more consistent with other archs) to have
> >> 
> >>   pt_regs->ARM_lr = original lr
> >>   pt_refs->ARM_pc = current lr
> >> 
> >> instead?
> 
> The stack would look like this then
> 
> @ ...   | ARM_ip | ARM_sp | ARM_lr  | ARM_pc  | ...  |
> @ 0  4  48   52   56606468   
> 72
> @ R0 | R1 | ... | LR | SP + 4 | original LR | original PC | PSR | OLD_R0 
> | original LR |
> 
> I.e. the pt_regs would capture almost the full context of the
> instrumented function (except for ip).
> 
So basicly what you are saying is:
- instead of current LR save original LR (previous one saved in instrumented 
function epilog)
- instead of current PC save original PC (previous one saved in instrumented 
f

Re: [PATCHv4] arm: ftrace: Adds support for CONFIG_DYNAMIC_FTRACE_WITH_REGS

2017-02-28 Thread Nicolai Stange
Hi Abel,

On Tue, Feb 28 2017, Abel Vesa wrote:

> On Mon, Feb 27, 2017 at 04:52:06PM +0100, Nicolai Stange wrote:
>> On Fri, Feb 24 2017, Abel Vesa wrote:
>> 
>> 
>> 
>> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>> > index fda6a46..877df5b 100644
>> > --- a/arch/arm/Kconfig
>> > +++ b/arch/arm/Kconfig
>> > @@ -55,6 +55,7 @@ config ARM
>> >select HAVE_DMA_API_DEBUG
>> >select HAVE_DMA_CONTIGUOUS if MMU
>> >select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
>> > +  select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE && 
>> > OLD_MCOUNT
>> 
>> 
>> AFAICS, your code depends on the __gnu_mcount_nc calling conventions,
>> i.e. on gcc pushing the original lr on the stack. In particular, there's
>> no implementation of a ftrace_regs_caller_old or so.
>> 
>> So shouldn't this read as !OLD_MCOUNT instead?
> The implementation of __ftrace_modify_code which sets the kernel text to rw
> needs OLD_MCOUNT (that is, the arch specific one, not the generic one).

You're right that ARM's implementation of __ftrace_modify_code() is hidden
within an #ifdef CONFIG_OLD_MCOUNT.

But,
 - its implementation doesn't "need" or depend on OLD_MCOUNT
 - and it's true in general that the kernel text must be made writable
   before ftrace_modify_all_code() attempts to write to it.

So, I bet that the set_kernel_text_rw()-calling ARM implementations of
arch_ftrace_update_code() and __ftrace_modify_code() resp. have been
inserted under that CONFIG_OLD_MCOUNT #ifdef by mistake with commit
80d6b0c2eed2 ("ARM: mm: allow text and rodata sections to be
read-only").

In conclusion, I claim that DYNAMIC_FTRACE w/o OLD_MCOUNT had been
broken before your patch already. I didn't explicitly test that though.

I think that should be fixed rather than your DYNAMIC_FTRACE_WITH_REGS
pulling in OLD_MCOUNT in order to repair DYNAMIC_FTRACE.

Especially since your implementation seems to require !OLD_MCOUNT...

>> Also, at least the ldmia ..., {..., sp, ...} insn needs a !THUMB2_KERNEL.
>> 
>> 
>> >select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) 
>> > && MMU
>> >select HAVE_EXIT_THREAD
>> >select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL)
>> 
>> 
>> 
>> > diff --git a/arch/arm/kernel/entry-ftrace.S 
>> > b/arch/arm/kernel/entry-ftrace.S
>> > index c73c403..3916dd6 100644
>> > --- a/arch/arm/kernel/entry-ftrace.S
>> > +++ b/arch/arm/kernel/entry-ftrace.S
>> > @@ -92,12 +92,78 @@
>> >  2:mcount_exit
>> >  .endm
>> >  
>> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>> > +
>> > +.macro __ftrace_regs_caller
>> > +
>> > +  sub sp, sp, #8  @ space for CPSR and OLD_R0 (not used)
>> > +
>> > +  add ip, sp, #12 @ move in IP the value of SP as it was
>> > +  @ before the push {lr} of the mcount mechanism
>> > +  stmdb   sp!, {ip,lr,pc}
>> > +  stmdb   sp!, {r0-r11,lr}
>> > +
>> > +  @ stack content at this point:
>> > +  @ 0  4  48   52   56   60   6468   72
>> > +  @ R0 | R1 | ... | LR | SP + 4 | LR | PC | PSR | OLD_R0 | previous LR |
>> 
>> Being a constant, the saved pc is not very useful, I think.
> So you're saying skip it ? But you still need to leave space for it. So why 
> not
> just save it even if the value is not useful?

No, no, I don't want to skip it. I'd just prefer to have the pt_regs'
->ARM_lr and ->ARM_pc slots filled with different, perhaps more useful
values:

>> 
>> Wouldn't it be better (and more consistent with other archs) to have
>> 
>>   pt_regs->ARM_lr = original lr
>>   pt_refs->ARM_pc = current lr
>> 
>> instead?

The stack would look like this then

@ ...   | ARM_ip | ARM_sp | ARM_lr  | ARM_pc  | ...  |
@ 0  4  48   52   56606468   72
@ R0 | R1 | ... | LR | SP + 4 | original LR | original PC | PSR | OLD_R0 | 
original LR |

I.e. the pt_regs would capture almost the full context of the
instrumented function (except for ip).

Thanks,

Nicolai


Re: [PATCHv4] arm: ftrace: Adds support for CONFIG_DYNAMIC_FTRACE_WITH_REGS

2017-02-27 Thread Abel Vesa
On Mon, Feb 27, 2017 at 04:52:06PM +0100, Nicolai Stange wrote:
> Hi Abel,
> 
> On Fri, Feb 24 2017, Abel Vesa wrote:
> 
> 
> 
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index fda6a46..877df5b 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -55,6 +55,7 @@ config ARM
> > select HAVE_DMA_API_DEBUG
> > select HAVE_DMA_CONTIGUOUS if MMU
> > select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
> > +   select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE && 
> > OLD_MCOUNT
> 
> 
> AFAICS, your code depends on the __gnu_mcount_nc calling conventions,
> i.e. on gcc pushing the original lr on the stack. In particular, there's
> no implementation of a ftrace_regs_caller_old or so.
> 
> So shouldn't this read as !OLD_MCOUNT instead?
The implementation of __ftrace_modify_code which sets the kernel text to rw
needs OLD_MCOUNT (that is, the arch specific one, not the generic one).
> 
> Also, at least the ldmia ..., {..., sp, ...} insn needs a !THUMB2_KERNEL.
> 
> 
> > select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) 
> > && MMU
> > select HAVE_EXIT_THREAD
> > select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL)
> 
> 
> 
> > diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
> > index c73c403..3916dd6 100644
> > --- a/arch/arm/kernel/entry-ftrace.S
> > +++ b/arch/arm/kernel/entry-ftrace.S
> > @@ -92,12 +92,78 @@
> >  2: mcount_exit
> >  .endm
> >  
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +
> > +.macro __ftrace_regs_caller
> > +
> > +   sub sp, sp, #8  @ space for CPSR and OLD_R0 (not used)
> > +
> > +   add ip, sp, #12 @ move in IP the value of SP as it was
> > +   @ before the push {lr} of the mcount mechanism
> > +   stmdb   sp!, {ip,lr,pc}
> > +   stmdb   sp!, {r0-r11,lr}
> > +
> > +   @ stack content at this point:
> > +   @ 0  4  48   52   56   60   6468   72
> > +   @ R0 | R1 | ... | LR | SP + 4 | LR | PC | PSR | OLD_R0 | previous LR |
> 
> Being a constant, the saved pc is not very useful, I think.
So you're saying skip it ? But you still need to leave space for it. So why not
just save it even if the value is not useful?
> 
> Wouldn't it be better (and more consistent with other archs) to have
> 
>   pt_regs->ARM_lr = original lr
>   pt_refs->ARM_pc = current lr
> 
> instead?
> 
> A (hypothetical) klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
> would do the more intuitive
> 
>   regs->ARM_pc = ip;
> 
> rather than a
> 
>   regs->ARM_lr = ip
> 
> then.
You are right. There is a subsequent patch I'm currently working on that will 
enable livepatch and will bring an implementation for klp_arch_set_pc as you 
described it. I still don't get what is wrong with the code though?

> 
> In addition, the original lr register would be made available to ftrace
> handlers this way.
> 
> 
> > +   mov r3, sp  @ struct pt_regs*
> > +   ldr r2, =function_trace_op
> > +   ldr r2, [r2]@ pointer to the current
> > +   @ function tracing op
> > +   ldr r1, [sp, #PT_REGS_SIZE] @ lr of instrumented func
> > +   mcount_adjust_addr  r0, lr  @ instrumented function
> > +
> > +   .globl ftrace_regs_call
> > +ftrace_regs_call:
> > +   bl  ftrace_stub
> > +
> > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > +   .globl ftrace_graph_regs_call
> > +ftrace_graph_regs_call:
> > +   mov r0, r0
> > +#endif
> > +   @ pop saved regs
> > +   @ first, get the previous LR value from stack
> > +   ldr lr, [sp, #PT_REGS_SIZE]
> > +   @ now pop the rest of the saved registers INCLUDING stack pointer
> > +   ldmia   sp, {r0-r11, ip, sp, pc}
> > +.endm
> > +
> 
> 
> > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > +.macro __ftrace_graph_regs_caller
> > +
> > +   sub r0, fp, #4  @ lr of instrumented routine (parent)
> > +
> > +   @ called from __ftrace_regs_caller
> > +   ldr r1, [sp, #S_LR] @ instrumented routine (func)
> > +   mcount_adjust_addr  r1, r1
> > +
> > +   sub r2, r0, #4  @ frame pointer
> 
> Given that r2 is prepare_ftrace_return()'s frame_pointer argument, is
> 
>   r2 = fp - 4 - 4 = fp - 8
> 
> really correct / what is wanted here?
> 
You are right. 
-   sub r2, r0, #4  @ frame pointer
+   mov r2, fp  @ frame pointer

> > +   bl  prepare_ftrace_return
> > +
> > +   @ pop registers saved in ftrace_regs_caller
> > +   @ first, get the previous LR value from stack
> > +   ldr lr, [sp, #PT_REGS_SIZE]
> > +   @ now pop the rest of the saved registers INCLUDING stack pointer
> > +   ldmia   sp, {r0-r11, ip, sp, pc}
> > +.endm
> > +#endif
> > +#endif
> > +
> 
> 
> 
> 
> Thanks,
> 
> Nicolai

Thanks



Re: [PATCHv4] arm: ftrace: Adds support for CONFIG_DYNAMIC_FTRACE_WITH_REGS

2017-02-27 Thread Nicolai Stange
Hi Abel,

On Fri, Feb 24 2017, Abel Vesa wrote:



> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index fda6a46..877df5b 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -55,6 +55,7 @@ config ARM
>   select HAVE_DMA_API_DEBUG
>   select HAVE_DMA_CONTIGUOUS if MMU
>   select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
> + select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE && 
> OLD_MCOUNT


AFAICS, your code depends on the __gnu_mcount_nc calling conventions,
i.e. on gcc pushing the original lr on the stack. In particular, there's
no implementation of a ftrace_regs_caller_old or so.

So shouldn't this read as !OLD_MCOUNT instead?

Also, at least the ldmia ..., {..., sp, ...} insn needs a !THUMB2_KERNEL.


>   select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) 
> && MMU
>   select HAVE_EXIT_THREAD
>   select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL)



> diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
> index c73c403..3916dd6 100644
> --- a/arch/arm/kernel/entry-ftrace.S
> +++ b/arch/arm/kernel/entry-ftrace.S
> @@ -92,12 +92,78 @@
>  2:   mcount_exit
>  .endm
>  
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +
> +.macro __ftrace_regs_caller
> +
> + sub sp, sp, #8  @ space for CPSR and OLD_R0 (not used)
> +
> + add ip, sp, #12 @ move in IP the value of SP as it was
> + @ before the push {lr} of the mcount mechanism
> + stmdb   sp!, {ip,lr,pc}
> + stmdb   sp!, {r0-r11,lr}
> +
> + @ stack content at this point:
> + @ 0  4  48   52   56   60   6468   72
> + @ R0 | R1 | ... | LR | SP + 4 | LR | PC | PSR | OLD_R0 | previous LR |

Being a constant, the saved pc is not very useful, I think.

Wouldn't it be better (and more consistent with other archs) to have

  pt_regs->ARM_lr = original lr
  pt_refs->ARM_pc = current lr

instead?

A (hypothetical) klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
would do the more intuitive

  regs->ARM_pc = ip;

rather than a

  regs->ARM_lr = ip

then.

In addition, the original lr register would be made available to ftrace
handlers this way.


> + mov r3, sp  @ struct pt_regs*
> + ldr r2, =function_trace_op
> + ldr r2, [r2]@ pointer to the current
> + @ function tracing op
> + ldr r1, [sp, #PT_REGS_SIZE] @ lr of instrumented func
> + mcount_adjust_addr  r0, lr  @ instrumented function
> +
> + .globl ftrace_regs_call
> +ftrace_regs_call:
> + bl  ftrace_stub
> +
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> + .globl ftrace_graph_regs_call
> +ftrace_graph_regs_call:
> + mov r0, r0
> +#endif
> + @ pop saved regs
> + @ first, get the previous LR value from stack
> + ldr lr, [sp, #PT_REGS_SIZE]
> + @ now pop the rest of the saved registers INCLUDING stack pointer
> + ldmia   sp, {r0-r11, ip, sp, pc}
> +.endm
> +


> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +.macro __ftrace_graph_regs_caller
> +
> + sub r0, fp, #4  @ lr of instrumented routine (parent)
> +
> + @ called from __ftrace_regs_caller
> + ldr r1, [sp, #S_LR] @ instrumented routine (func)
> + mcount_adjust_addr  r1, r1
> +
> + sub r2, r0, #4  @ frame pointer

Given that r2 is prepare_ftrace_return()'s frame_pointer argument, is

  r2 = fp - 4 - 4 = fp - 8

really correct / what is wanted here?

> + bl  prepare_ftrace_return
> +
> + @ pop registers saved in ftrace_regs_caller
> + @ first, get the previous LR value from stack
> + ldr lr, [sp, #PT_REGS_SIZE]
> + @ now pop the rest of the saved registers INCLUDING stack pointer
> + ldmia   sp, {r0-r11, ip, sp, pc}
> +.endm
> +#endif
> +#endif
> +




Thanks,

Nicolai