Re: 3 quick questions about stack alignment for powerpc (32-bit) signal handlers [the change that caused misaligned]

2016-02-02 Thread Justin Hibbits
Good catch!  I'll commit the change tonight.

- Justin

On Tue, Feb 2, 2016 at 3:48 AM, Mark Millard  wrote:
> I tried the change to -32 and 32 (from -20 and 20) on/for the powerpc 
> (32-bit) PowerMac that I use and the results were:
>
> A) "info frame" in gdb shows signal handlers are now started with 16-byte 
> aligned stack frames. (Applies to gcc 4.2.1 based contexts too, not just to 
> the clang 3.8.0 ones with the __vfprintf-tied segmentation faults during 
> signals.)
>
> and. . .
>
> B) The "clang 3.8.0 compiled __vfprintf" segmentation faults in libc/stdio 
> library code during signal handlers that use such code no longer happen 
> because the alignment matches the code requirements.
>
> I've added this information to Bug 206810.
>
>
> (Note: There are a couple of segmentation fault contexts that I've never tied 
> down to any specific property: no discovered evidence of signal handler 
> involvement or of __vfprintf involvement, for example. These are still a 
> problem. But where I had tied the faults to signal handlers using __vfprintf 
> now instead work fine in my experimental clang 3.8.0 based builds.)
>
>
> ===
> Mark Millard
> markmi at dsl-only.net
>
> On 2016-Feb-1, at 12:11 AM, Mark Millard  wrote:
>
> The -16/16 code below produced correct alignment but too little space.
>
> The -20/20 code below produces enough space but misalignment.
>
> To maintain 16-byte alignment while increasing the space would have required 
> going from -16/16 to -32/32. At least that is how I understand this code.
>
>
>> Index: sys/powerpc/powerpc/sigcode32.S
>> ===
>> --- sys/powerpc/powerpc/sigcode32.S 
>> (.../head/sys/powerpc/powerpc/sigcode32.S)  (revision 209975)
>> +++ sys/powerpc/powerpc/sigcode32.S 
>> (.../projects/clang380-import/sys/powerpc/powerpc/sigcode32.S)  (working 
>> copy)
>> @@ -45,9 +45,9 @@
>>  */
>>.globl  CNAME(sigcode32),CNAME(szsigcode32)
>> CNAME(sigcode32):
>> -   addi1,1,-16 /* reserved space for callee */
>> +   addi1,1,-20 /* reserved space for callee */
>>blrl
>> -   addi3,1,16+SF_UC/* restore sp, and get >sf_uc 
>> */
>> +   addi3,1,20+SF_UC/* restore sp, and get >sf_uc 
>> */
>>li  0,SYS_sigreturn
>>sc  /* sigreturn(scp) */
>>li  0,SYS_exit
>
>
>
> The "working copy" is -r266778 from 2014-May-27.
>
> -r209975 is from 2010-Jul-13.
>
>
> ===
> Mark Millard
> markmi at dsl-only.net
>
> On 2016-Jan-31, at 10:58 PM, Mark Millard  wrote:
>
> Just a correction to a sentence that I wrote. I had written:
>
>> Frame at:0x...90 vs. 0x...1c
>> call by frame:   0x...b0 vs. 0x...1c
>> Arglist at:  0x...70 vs. 0x...dc
>> Locals at:   0x...70 vs. 0x...dc
>> Previous frame's sp: 0x...90 vs. 0x...1c
>>
>> It looks like 4 additional pad bytes on the user/process stack are needed to 
>> get back to alignment.
>
> Of course the figures on the right need to get smaller, not larger: The stack 
> grows towards smaller addresses. So to get to 0x...0 on the right I should 
> have said:
>
> It looks like 12 additional pad bytes on the user/process stack are needed to 
> get back to alignment.
>
> That would produce:
>
> Frame at:0x...90 vs. 0x...10
> call by frame:   0x...b0 vs. 0x...10
> Arglist at:  0x...70 vs. 0x...d0
> Locals at:   0x...70 vs. 0x...d0
> Previous frame's sp: 0x...90 vs. 0x...10
>
> ===
> Mark Millard
> markmi at dsl-only.net
>
> On 2016-Jan-31, at 10:47 PM, Mark Millard  wrote:
>
> More evidence: By adding "break raise" and then using "info frame" to show 
> the alignment at that point I can show that the later signal delivery changes 
> the alignment on the user process stack compared to when raise was called. 
> (Later I show the same for thr_kill.)
>
>> Breakpoint 2, __raise (s=29) at /usr/src/lib/libc/gen/raise.c:50
>> warning: Source file is more recent than executable.
>> 50if (__sys_thr_self() == -1)
>> (gdb) info frame
>> Stack level 0, frame at 0xdc90:
>> pc = 0x41904630 in __raise (/usr/src/lib/libc/gen/raise.c:50); saved pc = 
>> 0x1800774
>> called by frame at 0xdcb0
>> source language c.
>> Arglist at 0xdc70, args: s=29
>> Locals at 0xdc70, Previous frame's sp is 0xdc90
>> Saved registers:
>> r29 at 0xdc84, r30 at 0xdc88, r31 at 0xdc8c, pc at 0xdc94, 
>> lr at 0xdc94
>> (gdb) cont
>> Continuing.
>>
>> Program received signal SIGINFO, Information request.
>>
>> Breakpoint 1, 0x018006d0 in handler ()
>> (gdb) info frame
>> Stack level 0, frame at 0xd71c:
>> pc = 0x18006d0 in handler; saved pc = 0xe008
>> called by frame at 0xd71c
>> Arglist at 0xd6dc, args:
>> Locals at 0xd6dc, Previous frame's sp is 0xd71c
>> Saved registers:
>> r31 at 0xd718, pc at 0xd720, lr at 

Re: 3 quick questions about stack alignment for powerpc (32-bit) signal handlers [the change that caused misaligned]

2016-02-02 Thread Konstantin Belousov
On Tue, Feb 02, 2016 at 10:05:16AM -0600, Justin Hibbits wrote:
> Good catch!  I'll commit the change tonight.
I looked once at the powerpc sigsend(), and I think that it has an
issue. The usfp is calculated by taking the stack pointer at the time
of signal delivery and substracting the sigframe size. This means that
a transient misalignment during some code (e.g. leaf function) is
transferred to the signal handler execution.

Other arches explicitely realign stack pointer for the signal
frame before the frame is formed.

I am not sure if the problem reported in the thread is caused by this
or not, but forced realignment in sendsig() is required for ABI compliance.

> 
> - Justin
> 
> On Tue, Feb 2, 2016 at 3:48 AM, Mark Millard  wrote:
> > I tried the change to -32 and 32 (from -20 and 20) on/for the powerpc 
> > (32-bit) PowerMac that I use and the results were:
> >
> > A) "info frame" in gdb shows signal handlers are now started with 16-byte 
> > aligned stack frames. (Applies to gcc 4.2.1 based contexts too, not just to 
> > the clang 3.8.0 ones with the __vfprintf-tied segmentation faults during 
> > signals.)
> >
> > and. . .
> >
> > B) The "clang 3.8.0 compiled __vfprintf" segmentation faults in libc/stdio 
> > library code during signal handlers that use such code no longer happen 
> > because the alignment matches the code requirements.
> >
> > I've added this information to Bug 206810.
> >
> >
> > (Note: There are a couple of segmentation fault contexts that I've never 
> > tied down to any specific property: no discovered evidence of signal 
> > handler involvement or of __vfprintf involvement, for example. These are 
> > still a problem. But where I had tied the faults to signal handlers using 
> > __vfprintf now instead work fine in my experimental clang 3.8.0 based 
> > builds.)
> >
> >
> > ===
> > Mark Millard
> > markmi at dsl-only.net
> >
> > On 2016-Feb-1, at 12:11 AM, Mark Millard  wrote:
> >
> > The -16/16 code below produced correct alignment but too little space.
> >
> > The -20/20 code below produces enough space but misalignment.
> >
> > To maintain 16-byte alignment while increasing the space would have 
> > required going from -16/16 to -32/32. At least that is how I understand 
> > this code.
> >
> >
> >> Index: sys/powerpc/powerpc/sigcode32.S
> >> ===
> >> --- sys/powerpc/powerpc/sigcode32.S 
> >> (.../head/sys/powerpc/powerpc/sigcode32.S)  (revision 209975)
> >> +++ sys/powerpc/powerpc/sigcode32.S 
> >> (.../projects/clang380-import/sys/powerpc/powerpc/sigcode32.S)  (working 
> >> copy)
> >> @@ -45,9 +45,9 @@
> >>  */
> >>.globl  CNAME(sigcode32),CNAME(szsigcode32)
> >> CNAME(sigcode32):
> >> -   addi1,1,-16 /* reserved space for callee */
> >> +   addi1,1,-20 /* reserved space for callee */
> >>blrl
> >> -   addi3,1,16+SF_UC/* restore sp, and get 
> >> >sf_uc */
> >> +   addi3,1,20+SF_UC/* restore sp, and get 
> >> >sf_uc */
> >>li  0,SYS_sigreturn
> >>sc  /* sigreturn(scp) */
> >>li  0,SYS_exit
> >
> >
> >
> > The "working copy" is -r266778 from 2014-May-27.
> >
> > -r209975 is from 2010-Jul-13.
> >
> >
> > ===
> > Mark Millard
> > markmi at dsl-only.net
> >
> > On 2016-Jan-31, at 10:58 PM, Mark Millard  wrote:
> >
> > Just a correction to a sentence that I wrote. I had written:
> >
> >> Frame at:0x...90 vs. 0x...1c
> >> call by frame:   0x...b0 vs. 0x...1c
> >> Arglist at:  0x...70 vs. 0x...dc
> >> Locals at:   0x...70 vs. 0x...dc
> >> Previous frame's sp: 0x...90 vs. 0x...1c
> >>
> >> It looks like 4 additional pad bytes on the user/process stack are needed 
> >> to get back to alignment.
> >
> > Of course the figures on the right need to get smaller, not larger: The 
> > stack grows towards smaller addresses. So to get to 0x...0 on the right I 
> > should have said:
> >
> > It looks like 12 additional pad bytes on the user/process stack are needed 
> > to get back to alignment.
> >
> > That would produce:
> >
> > Frame at:0x...90 vs. 0x...10
> > call by frame:   0x...b0 vs. 0x...10
> > Arglist at:  0x...70 vs. 0x...d0
> > Locals at:   0x...70 vs. 0x...d0
> > Previous frame's sp: 0x...90 vs. 0x...10
> >
> > ===
> > Mark Millard
> > markmi at dsl-only.net
> >
> > On 2016-Jan-31, at 10:47 PM, Mark Millard  wrote:
> >
> > More evidence: By adding "break raise" and then using "info frame" to show 
> > the alignment at that point I can show that the later signal delivery 
> > changes the alignment on the user process stack compared to when raise was 
> > called. (Later I show the same for thr_kill.)
> >
> >> Breakpoint 2, __raise (s=29) at /usr/src/lib/libc/gen/raise.c:50
> >> warning: Source file is more recent than executable.
> >> 50if 

Re: 3 quick questions about stack alignment for powerpc (32-bit) signal handlers [the change that caused misaligned]

2016-02-02 Thread Justin Hibbits
On Tue, Feb 2, 2016 at 10:13 AM, Konstantin Belousov  wrote:
> On Tue, Feb 02, 2016 at 10:05:16AM -0600, Justin Hibbits wrote:
>> Good catch!  I'll commit the change tonight.
> I looked once at the powerpc sigsend(), and I think that it has an
> issue. The usfp is calculated by taking the stack pointer at the time
> of signal delivery and substracting the sigframe size. This means that
> a transient misalignment during some code (e.g. leaf function) is
> transferred to the signal handler execution.
>
> Other arches explicitely realign stack pointer for the signal
> frame before the frame is formed.
>
> I am not sure if the problem reported in the thread is caused by this
> or not, but forced realignment in sendsig() is required for ABI compliance.

Good point.  Currently the assumption is that the stack will always be
16-byte aligned, which is required per ABI.

Since there's no push/pop, only full frame creation/destruction, it
hasn't bitten us yet, but it should be fixed.  It's not the cause of
this bug, though.  This bug is caused after sendsig(), in the sigcode
trampoline in user space.

- Justin
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


Re: 3 quick questions about stack alignment for powerpc (32-bit) signal handlers [the change that caused misaligned]

2016-02-01 Thread Mark Millard
The -16/16 code below produced correct alignment but too little space.

The -20/20 code below produces enough space but misalignment.

To maintain 16-byte alignment while increasing the space would have required 
going from -16/16 to -32/32. At least that is how I understand this code.


Index: sys/powerpc/powerpc/sigcode32.S
===
--- sys/powerpc/powerpc/sigcode32.S 
(.../head/sys/powerpc/powerpc/sigcode32.S)  (revision 209975)
+++ sys/powerpc/powerpc/sigcode32.S 
(.../projects/clang380-import/sys/powerpc/powerpc/sigcode32.S)  (working copy)
@@ -45,9 +45,9 @@
  */
.globl  CNAME(sigcode32),CNAME(szsigcode32)
 CNAME(sigcode32):
-   addi1,1,-16 /* reserved space for callee */
+   addi1,1,-20 /* reserved space for callee */
blrl
-   addi3,1,16+SF_UC/* restore sp, and get >sf_uc */
+   addi3,1,20+SF_UC/* restore sp, and get >sf_uc */
li  0,SYS_sigreturn
sc  /* sigreturn(scp) */
li  0,SYS_exit



The "working copy" is -r266778 from 2014-May-27.

-r209975 is from 2010-Jul-13.


===
Mark Millard
markmi at dsl-only.net

On 2016-Jan-31, at 10:58 PM, Mark Millard  wrote:

Just a correction to a sentence that I wrote. I had written:

> Frame at:0x...90 vs. 0x...1c
> call by frame:   0x...b0 vs. 0x...1c
> Arglist at:  0x...70 vs. 0x...dc
> Locals at:   0x...70 vs. 0x...dc
> Previous frame's sp: 0x...90 vs. 0x...1c
> 
> It looks like 4 additional pad bytes on the user/process stack are needed to 
> get back to alignment.

Of course the figures on the right need to get smaller, not larger: The stack 
grows towards smaller addresses. So to get to 0x...0 on the right I should have 
said:

It looks like 12 additional pad bytes on the user/process stack are needed to 
get back to alignment.

That would produce:

Frame at:0x...90 vs. 0x...10
call by frame:   0x...b0 vs. 0x...10
Arglist at:  0x...70 vs. 0x...d0
Locals at:   0x...70 vs. 0x...d0
Previous frame's sp: 0x...90 vs. 0x...10

===
Mark Millard
markmi at dsl-only.net

On 2016-Jan-31, at 10:47 PM, Mark Millard  wrote:

More evidence: By adding "break raise" and then using "info frame" to show the 
alignment at that point I can show that the later signal delivery changes the 
alignment on the user process stack compared to when raise was called. (Later I 
show the same for thr_kill.)

> Breakpoint 2, __raise (s=29) at /usr/src/lib/libc/gen/raise.c:50
> warning: Source file is more recent than executable.
> 50if (__sys_thr_self() == -1)
> (gdb) info frame
> Stack level 0, frame at 0xdc90:
> pc = 0x41904630 in __raise (/usr/src/lib/libc/gen/raise.c:50); saved pc = 
> 0x1800774
> called by frame at 0xdcb0
> source language c.
> Arglist at 0xdc70, args: s=29
> Locals at 0xdc70, Previous frame's sp is 0xdc90
> Saved registers:
> r29 at 0xdc84, r30 at 0xdc88, r31 at 0xdc8c, pc at 0xdc94, lr 
> at 0xdc94
> (gdb) cont
> Continuing.
> 
> Program received signal SIGINFO, Information request.
> 
> Breakpoint 1, 0x018006d0 in handler ()
> (gdb) info frame
> Stack level 0, frame at 0xd71c:
> pc = 0x18006d0 in handler; saved pc = 0xe008
> called by frame at 0xd71c
> Arglist at 0xd6dc, args: 
> Locals at 0xd6dc, Previous frame's sp is 0xd71c
> Saved registers:
> r31 at 0xd718, pc at 0xd720, lr at 0xd720

Note the difference (raise before delivery vs. handler via delivery):

Frame at:0x...90 vs. 0x...1c
call by frame:   0x...b0 vs. 0x...1c
Arglist at:  0x...70 vs. 0x...dc
Locals at:   0x...70 vs. 0x...dc
Previous frame's sp: 0x...90 vs. 0x...1c

It looks like 4 additional pad bytes on the user/process stack are needed to 
get back to alignment.

[The span of addresses seems to be about: 0xdc90-0xd6dc==0x5B4==1460 
(raise's "frame at" minus handler's "Locals at").]


If I look at the frame for "break thr_kill" it also still shows an aligned 
user/process stack before the delivery:

> Breakpoint 3, 0x419046a0 in thr_kill () from /lib/libc.so.7
> (gdb) info frame
> Stack level 0, frame at 0xdc70:
> pc = 0x419046a0 in thr_kill; saved pc = 0x41904650
> called by frame at 0xdc90
> Arglist at 0xdc70, args: 
> Locals at 0xdc70, Previous frame's sp is 0xdc70

(The relevant addresses are the same as raise showed.)


Reminder of the source program structure that uses the potentially frame/stack 
alignment sensitive libc/stdio library code:

> # more sig_snprintf_use_test.c 
> #include  // for signal, SIGINFO, SIG_ERR, raise.
> #include   // for snprintf
> 
> void handler(int sig)
> {
>   char buf[32];
>   snprintf(buf, sizeof buf, "%d", sig); // FreeBSD's world does such
> // things in some of its handlers.
> }
> 
>