Re: clang 3.8.0 can mess up __builtin_dwarf_cfa (), at least for TARGET_ARCH=armv6, powerpc, powerpc64: a bug 207325 update

Mark Millard Wed, 02 Mar 2016 20:58:39 -0800

I now have the explanation of the clang 3.8.0 buildworld's libcxxrt __cxa_throw 
related code not detecting the catch in main: main is skipped because of 
mishandling r31 references in the eh.frame information. The problem is distinct 
from the other reported problem(s).

libcxxrt ends up with (dwarfdump -v -v -F output for __cxa_throw as compiled by 
clang 3.8.0):

> <    0><0x00010620:0x00010794><__cxa_throw><fde offset 0x000006c0 length: 
> 0x00000028><eh aug data len 0x0>
>         0x00010620: <off cfa=00(r1) > 
>         0x00010634: <off cfa=48(r1) > <off r30=-8(cfa) > <off r31=-4(cfa) > 
> <off r65=04(cfa) > 
>         0x00010638: <off cfa=48(r31) > <off r25=-28(cfa) > <off r26=-24(cfa) 
> > <off r27=-20(cfa) > <off r28=-16(cfa) > <off r29=-12(cfa) > <off 
> r30=-8(cfa) > <off r31=-4(cfa) > <off r65=04(cfa) > 
>  fde section offset 1728 0x000006c0 cie offset for fde: 1732 0x000006c4
>          0 DW_CFA_advance_loc 20  (5 * 4)
>          1 DW_CFA_def_cfa_offset 48
>          3 DW_CFA_offset r31 -4  (1 * -4)
>          5 DW_CFA_offset r30 -8  (2 * -4)
>          7 DW_CFA_offset_extended_sf r65 4  (-1 * -4)
>         10 DW_CFA_advance_loc 4  (1 * 4)
>         11 DW_CFA_def_cfa_register r31
>         13 DW_CFA_offset r25 -28  (7 * -4)
>         15 DW_CFA_offset r26 -24  (6 * -4)
>         17 DW_CFA_offset r27 -20  (5 * -4)
>         19 DW_CFA_offset r28 -16  (4 * -4)
>         21 DW_CFA_offset r29 -12  (3 * -4)
>         23 DW_CFA_offset r30 -8  (2 * -4)
>         25 DW_CFA_nop
>         26 DW_CFA_nop

Note the cfa and r31 references in:

> 0x00010634: <off cfa=48(r1) >  . . . <off r31=-4(cfa) > . . .
> 0x00010638: <off cfa=48(r31) > . . . <off r31=-4(cfa) > . . .

The use of r31 to define cfa is from (in part) the clang++ 3.8.0 code 
generation using r31 as a frame pointer in additino to r1 as the stack pointer. 
The matching actual sequence of operations listed above is:

>          1 DW_CFA_def_cfa_offset 48
>          3 DW_CFA_offset r31 -4  (1 * -4)
> . . .
>         11 DW_CFA_def_cfa_register r31

The "1 DW_CFA_def_cfa_offset 48" just notes that r1 (the stack pointer) was 
decremented by 48 by the prior instruction so 48 needs to be added to the new 
r1 value to reference the same _Unwind_Context cfa value as the prior "<off 
cfa=00(r1) >" status does.

The "3 DW_CFA_offset r31 -4  (1 * -4)" was generated because (soon old) r31 
value was saved at address cfa-4 ("<off r31=-4(cfa) >"). This address to access 
what will be the old/saved r31 value is recorded in the _Unwind_Context reg[31].

The "11 DW_CFA_def_cfa_register r31" was generated because the prior 
instruction r31 was updated to be a copy of r1 for use as a frame pointer. Note 
that such does not change the _Unwind_Context cfa value. At this stage r1=r31 
and 48(r1)=48(r31) and such will hold until either r1 or r31 is changed in the 
routine (if either is).

The repeat of "<off r31=-4(cfa) >" on the "0x00010638: <off cfa=48(r31) >" line 
indicates that there is no change to where/how to find the pointer to the 
old/saved r31 value: no new DW_CFA_offset r31 "instruction" for interpretation.

[Note the messy mix of different r31's. gcc 4.2.1 does not (normally?) generate 
such TARGET_ARCH=powerpc code but clang++ 3.8.0 normally does. Thus clang++ 
touches an error that g++ 4.2.1 and the like normally do not.]

Unfortunately the above is not the interpretation given by the interpreter in 
libgcc_s:

"11 DW_CFA_def_cfa_register r31" instead accesses the old/saved r31 value via 
the pointer in _Unwind_Context reg[31] and then applies the offset 48 to that 
value.

The result is the wrong cfa value (which should not have changed at all) and 
all else is messed up after that. Since the old r31 value is an older Frame 
Pointer value, one frame has also been "skipped" in the process. (But the 
offset 48's involvement from there can produce pure junk for the cfa value that 
results.)

Code that sticks to cfa=OFFSET(r1) for the cfa will not see this error in the 
.eh_frame information's interpretation.

As for the lack of save/restore of some registers by _Unwind_RaiseException as 
generated by clang 3.8.0:

I did not earlier show how the the code involved picks to try to store to r3's 
(non-existent) save/restore place (as an example). I show that below.

The code is in #1 of:

> (gdb) bt
> #0  _Unwind_SetGR (context=<optimized out>, index=<optimized out>, 
> val=1105272880) at 
> /usr/src/gnu/lib/libgcc/../../../contrib/gcc/unwind-dw2.c:220
> #1  0x41b7139c in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, 
> actions=6, exception_class=<optimized out>, ue_header=0x41e12030, 
> context=0xffffd570)
>     at 
> /usr/src/gnu/lib/libsupc++/../../../contrib/libstdc++/libsupc++/eh_personality.cc:681
> #2  0x419915f8 in _Unwind_RaiseException_Phase2 (exc=<optimized out>, 
> context=<optimized out>) at 
> /usr/src/gnu/lib/libgcc/../../../contrib/gcc/unwind.inc:66
> #3  0x419911c0 in _Unwind_RaiseException (exc=<optimized out>) at 
> /usr/src/gnu/lib/libgcc/../../../contrib/gcc/unwind.inc:135
> #4  0x41b7075c in __cxxabiv1::__cxa_throw (obj=<optimized out>, 
> tinfo=<optimized out>, dest=<optimized out>) at 
> /usr/src/gnu/lib/libsupc++/../../../contrib/libstdc++/libsupc++/eh_throw.cc:71
> #5  0x01800920 in main () at exception_test.cpp:5

and looks like:

> 678     /* For targets with pointers smaller than the word size, we must 
> extend the
> 679        pointer, and this extension is target dependent.  */
> 680     _Unwind_SetGR (context, __builtin_eh_return_data_regno (0),
> 681                    __builtin_extend_pointer (ue_header));
> 682     _Unwind_SetGR (context, __builtin_eh_return_data_regno (1),
> 683                    handler_switch_value);
> 684     _Unwind_SetIP (context, landing_pad);

clang 3.8.0 agrees with gcc/g++ that __builtin_eh_return_data_regno (0) 
translates to the (int) value 3 (referencing r3) --despite clang 3.8.0 not 
providing anything that dwarfdump -v -v -F shows as saving/restoring r3 for 
_Unwind_RaiseException. (Similarly for some other such registers.)

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-29, at 11:10 PM, Mark Millard <markmi at dsl-only.net> wrote:

[TARGET_ARCH=powerpc context for all the evidence.]

I found another clang++ 3.8.0 vs. g++ 4.2.1 difference where the system 
libgcc_s depends on how it works for g++ 4.2.1 when clang 3.8.0 does not work 
the same way:

_Unwind_RaiseException is special in a way that makes it save and restore lots 
of registers it does not directly use. (I'm not sure what triggers having so 
many registers saved/restored.)

But gcc/g++ 4.2.1 saves and restores more registers than clang/clang++ 3.8.0 
does. That in turn leaves .eh_frame information for more registers for 
_Unwind_RaiseException for the gcc/g++ 4.2.1 context.

_Unwind_RaiseException from a clang++ 3.8.0 build of libgcc_s does not have 
save/restore for r3, r4, r5, r6, "r70" (from mfcr, dwarfdump notation). The C++ 
exception handling library code in libgcc_s depends on r3 (as one example). The 
pointer for r3 ends up being 0x0 and that causes a crash in examples that get 
that far using the system's libgcc_s.

_Unwind_RaiseException from a g++ 4.2.1 build of libgcc_s has save/restore for 
r3, r4, r5, r6, "r70" (from mfcr).

Later below I list one form of the specific evidence for the differences.

It may be that this and the __builtin_dwarf_cfa() "fix" covers all the problems 
for when libstdc++/libsupc++ are involved with the system libgcc_s instead of 
libc++/libcxxrt being involved.

In my view the registers to save/restore in routines like 
_Unwind_RaiseException should be considered as part of the overall ABI 
criteria. Under the rule "the TARGET_ARCH=powerpc ABI is always such that it is 
gcc/g++ 4.2.1 compatible", I take it that clang 3.8.0 is wrong for FreeBSD 
TARGET_ARCH=powerpc here: Another ABI violation.

TARGET_ARCH=powerpc64 and possibly others could have the same sort of issue. 
I've never gotten a clang/clang++ based TARGET_ARCH=powerpc64 as far as a 
complete buildworld. And for now I'm more interested in finding new types of 
errors for TARGET_ARCH=powerpc rather then what range of TARGET_ARCH's get a 
specific clang 3.8.0 problem.

Separately from the above I've shown that copying the following 3 files to a 
gcc 4.2.1 buildworld/buildkernel  TARGET_ARCH=powerpc context allows 
exception_test.clang++380.powerpc to run just fine:

> exception_test.clang++380.powerpc
> /usr/lib/libc++.so.1
> /lib/libcxxrt.so.1

(debug files for the libraries also copied)

That leaves the following libraries listed by ldd as being from the gcc 4.2.1 
buildworld:

> /lib/libm.so.5
> /lib/libc.so.7
> /lib/libgcc_s.so.1

> # ldd exception_test.clang++380.powerpc
> exception_test.clang++380.powerpc:
>       libc++.so.1 => /usr/lib/libc++.so.1 (0x4183e000)
>       libcxxrt.so.1 => /lib/libcxxrt.so.1 (0x41917000)
>       libm.so.5 => /lib/libm.so.5 (0x41942000)
>       libc.so.7 => /lib/libc.so.7 (0x41979000)
>       libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x41b1d000)

exception_test.clang++380.powerpc used with the clang 3.8.0 buildworld and its 
libgcc_s shows different behavior not likely to be explained by the 
_Unwind_RaiseException register save/restore differences. (The lack of some 
saves/restores would still be a problem if I get 
exception_test.clang++380.powerpc to get that far before doing something odd.)

I'm still trying to get evidence of the specific low-level problem for 
exception_test.clang++380.powerpc. It may be some time before I figure out 
anything useful.

Using some dwarfdump -v -v -F output for the evidence of register save/restore 
differences. . .

_Unwind_RaiseException from a g++ 4.2.1 build of libgcc_s has r3, r4, r5, r6, 
r70 (from mfcr). The library depends on r3 (as one example).

> fde section offset 1104 0x00000450 cie offset for fde: 1108 0x00000454
>         0 DW_CFA_advance_loc 8  (8 * 1)
>         1 DW_CFA_def_cfa_offset 3024
>         4 DW_CFA_advance_loc1 156
>         6 DW_CFA_offset r4 -232  (58 * -4)
>         8 DW_CFA_offset r3 -236  (59 * -4)
>        10 DW_CFA_offset r28 -160  (40 * -4)
>        12 DW_CFA_offset r27 -164  (41 * -4)
>        14 DW_CFA_offset r26 -168  (42 * -4)
>        16 DW_CFA_offset r25 -172  (43 * -4)
>        18 DW_CFA_offset r24 -176  (44 * -4)
>        20 DW_CFA_offset r23 -180  (45 * -4)
>        22 DW_CFA_offset r22 -184  (46 * -4)
>        24 DW_CFA_offset r21 -188  (47 * -4)
>        26 DW_CFA_offset r20 -192  (48 * -4)
>        28 DW_CFA_offset r19 -196  (49 * -4)
>        30 DW_CFA_offset r18 -200  (50 * -4)
>        32 DW_CFA_offset r17 -204  (51 * -4)
>        34 DW_CFA_offset r16 -208  (52 * -4)
>        36 DW_CFA_offset r15 -212  (53 * -4)
>        38 DW_CFA_offset r14 -216  (54 * -4)
>        40 DW_CFA_offset r63 -8  (2 * -4)
>        42 DW_CFA_offset r62 -16  (4 * -4)
>        44 DW_CFA_offset r61 -24  (6 * -4)
>        46 DW_CFA_offset r60 -32  (8 * -4)
>        48 DW_CFA_offset r59 -40  (10 * -4)
>        50 DW_CFA_offset r58 -48  (12 * -4)
>        52 DW_CFA_offset r57 -56  (14 * -4)
>        54 DW_CFA_offset r56 -64  (16 * -4)
>        56 DW_CFA_offset r55 -72  (18 * -4)
>        58 DW_CFA_offset r54 -80  (20 * -4)
>        60 DW_CFA_offset r53 -88  (22 * -4)
>        62 DW_CFA_offset r52 -96  (24 * -4)
>        64 DW_CFA_offset r51 -104  (26 * -4)
>        66 DW_CFA_offset r50 -112  (28 * -4)
>        68 DW_CFA_offset r49 -120  (30 * -4)
>        70 DW_CFA_offset r48 -128  (32 * -4)
>        72 DW_CFA_offset r47 -136  (34 * -4)
>        74 DW_CFA_offset r46 -144  (36 * -4)
>        76 DW_CFA_register r70 = r12
>        79 DW_CFA_offset_extended_sf r65 4  (-1 * -4)
>        82 DW_CFA_advance_loc 32  (32 * 1)
>        83 DW_CFA_offset r5 -228  (57 * -4)
>        85 DW_CFA_offset r31 -148  (37 * -4)
>        87 DW_CFA_offset r30 -152  (38 * -4)
>        89 DW_CFA_offset r29 -156  (39 * -4)
>        91 DW_CFA_offset_extended r70 -220  (55 * -4)
>        94 DW_CFA_offset r6 -224  (56 * -4)
>        96 DW_CFA_nop
>        97 DW_CFA_nop
>        98 DW_CFA_nop

_Unwind_RaiseException from clang++ 3.8.0 build of libgcc_s does not have has 
r3, r4, r5, r6, r70 (from mfcr). The library depends on r3 (as one example).

> fde section offset 692 0x000002b4 cie offset for fde: 696 0x000002b8
>         0 DW_CFA_advance_loc 20  (5 * 4)
>         1 DW_CFA_def_cfa_offset 2992
>         4 DW_CFA_offset r31 -148  (37 * -4)
>         6 DW_CFA_offset r30 -152  (38 * -4)
>         8 DW_CFA_offset_extended_sf r65 4  (-1 * -4)
>        11 DW_CFA_advance_loc 4  (1 * 4)
>        12 DW_CFA_def_cfa_register r31
>        14 DW_CFA_offset r14 -216  (54 * -4)
>        16 DW_CFA_offset r15 -212  (53 * -4)
>        18 DW_CFA_offset r16 -208  (52 * -4)
>        20 DW_CFA_offset r17 -204  (51 * -4)
>        22 DW_CFA_offset r18 -200  (50 * -4)
>        24 DW_CFA_offset r19 -196  (49 * -4)
>        26 DW_CFA_offset r20 -192  (48 * -4)
>        28 DW_CFA_offset r21 -188  (47 * -4)
>        30 DW_CFA_offset r22 -184  (46 * -4)
>        32 DW_CFA_offset r23 -180  (45 * -4)
>        34 DW_CFA_offset r24 -176  (44 * -4)
>        36 DW_CFA_offset r25 -172  (43 * -4)
>        38 DW_CFA_offset r26 -168  (42 * -4)
>        40 DW_CFA_offset r27 -164  (41 * -4)
>        42 DW_CFA_offset r28 -160  (40 * -4)
>        44 DW_CFA_offset r29 -156  (39 * -4)
>        46 DW_CFA_offset r30 -152  (38 * -4)
>        48 DW_CFA_offset r31 -148  (37 * -4)
>        50 DW_CFA_offset r46 -144  (36 * -4)
>        52 DW_CFA_offset r47 -136  (34 * -4)
>        54 DW_CFA_offset r48 -128  (32 * -4)
>        56 DW_CFA_offset r49 -120  (30 * -4)
>        58 DW_CFA_offset r50 -112  (28 * -4)
>        60 DW_CFA_offset r51 -104  (26 * -4)
>        62 DW_CFA_offset r52 -96  (24 * -4)
>        64 DW_CFA_offset r53 -88  (22 * -4)
>        66 DW_CFA_offset r54 -80  (20 * -4)
>        68 DW_CFA_offset r55 -72  (18 * -4)
>        70 DW_CFA_offset r56 -64  (16 * -4)
>        72 DW_CFA_offset r57 -56  (14 * -4)
>        74 DW_CFA_offset r58 -48  (12 * -4)
>        76 DW_CFA_offset r59 -40  (10 * -4)
>        78 DW_CFA_offset r60 -32  (8 * -4)
>        80 DW_CFA_offset r61 -24  (6 * -4)
>        82 DW_CFA_offset r62 -16  (4 * -4)
>        84 DW_CFA_offset r63 -8  (2 * -4)
>        86 DW_CFA_nop

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-29, at 3:20 AM, Mark Millard <markmi at dsl-only.net> wrote:

TARGET_ARCH=powerpc: Using Frame Depth 1 in "case Intrinsic::eh_dwarf_cfa" (and 
Offset 0 in "case Builtin::BI__builtin_dwarf_cfa") for 
PPCTargetLowering::LowerFRAMEADDR related use has allowed getting into 
_Unwind_RaiseException_Phase2 and __cxxabiv1::__gxx_personality_v0. The example 
is the 8 line example compiled under g++ 4.2.1 but then used under a buildworld 
that was built with clang 3.8.0:

# ldd exception_test.g++421.powerpc 
exception_test.g++421.powerpc:
        libstdc++.so.6 => /usr/local/lib/gcc49/libstdc++.so.6 (0x41840000)
        libm.so.5 => /lib/libm.so.5 (0x4196a000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x419a1000)
        libc.so.7 => /lib/libc.so.7 (0x419c0000)

_Unwind_RaiseException_Phase2 is well past the point of the failure and crash 
from having Frame Depth 0 instead.

It is getting a SEGV during the _Unwind_SetGR called via:

710       /* For targets with pointers smaller than the word size, we must 
extend the
711          pointer, and this extension is target dependent.  */
712       _Unwind_SetGR (context, __builtin_eh_return_data_regno (0),
713                      __builtin_extend_pointer (ue_header));

for:

_Unwind_SetGR (context=0xffffd570, index=3, val=1105272896) at 
/usr/src/gnu/lib/libgcc/../../../contrib/gcc/unwind-dw2.c:207

context->reg[3] is 0x0 and so its use in the following gets the SEGV.

217       ptr = context->reg[index];
218     
219       if (size == sizeof(_Unwind_Ptr))
220         * (_Unwind_Ptr *) ptr = val;

I'm not going to try to analyze the source of this before getting some sleep.

For the 8 line program being compiled by clang++ 3.8.0 instead the results are 
different than the above and than the original behavior: The program does not 
crash abnormally but also does not find the catch clause that it should. The 
std::terminate gets its normal SIGABRT instead of an earlier SEGV.

Again I'm not going to try to analyze the details before getting some sleep.

But I will mention that I've also already submitted a report that libgcc_s does 
not completely implement DW_CFA_remember_state and DW_CFA_restore_state and 
that the code generated on powerpc64 touches the defect and so ends up with 
misbehavior. These might be similar.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-28, at 10:13 PM, Mark Millard <markmi at dsl-only.net> wrote:

Back to the "case Builtin::BI__builtin_dwarf_cfa" and 
"PPCTargetLowering::LowerFRAMEADDR" context:

I made the wrong change and need to retry.

The detail. . .

Passing a 1 through instead of zero did not do what I expected to the code 
generated. Instead it added one instruction:

addi    r3,r3,1

resulting in (objdump -d --prefix-addresses on the .o):

> Disassembly of section .text:
> 00000000 <_Z1fv> mflr    r0
> 00000004 <_Z1fv+0x4> stw     r31,-4(r1)
> 00000008 <_Z1fv+0x8> stw     r0,4(r1)
> 0000000c <_Z1fv+0xc> stwu    r1,-16(r1)
> 00000010 <_Z1fv+0x10> mr      r31,r1
> 00000014 <_Z1fv+0x14> mr      r3,r31
> 00000018 <_Z1fv+0x18> addi    r3,r3,1
> 0000001c <_Z1fv+0x1c> bl      0000001c <_Z1fv+0x1c>
> 00000020 <_Z1fv+0x20> addi    r1,r1,16
> 00000024 <_Z1fv+0x24> lwz     r0,4(r1)
> 00000028 <_Z1fv+0x28> lwz     r31,-4(r1)
> 0000002c <_Z1fv+0x2c> mtlr    r0
> 00000030 <_Z1fv+0x30> blr

In other words: it added the 1 as a byte offset like the comments that I 
thought were wrong said.

Since it does not appear that PPCTargetLowering::LowerFRAMEADDR would do that 
with a 1 I conclude that PPCTargetLowering::LowerFRAMEADDR is not involved with 
that figure.

So looking around. . .

/usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp has:

> case Intrinsic::eh_dwarf_cfa: {
>  SDValue CfaArg = DAG.getSExtOrTrunc(getValue(I.getArgOperand(0)), sdl,
>                                      TLI.getPointerTy(DAG.getDataLayout()));
>  SDValue Offset = DAG.getNode(ISD::ADD, sdl,
>                               CfaArg.getValueType(),
>                               DAG.getNode(ISD::FRAME_TO_ARGS_OFFSET, sdl,
>                                           CfaArg.getValueType()),
>                               CfaArg);
>  SDValue FA = DAG.getNode(
>      ISD::FRAMEADDR, sdl, TLI.getPointerTy(DAG.getDataLayout()),
>      DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout())));
>  setValue(&I, DAG.getNode(ISD::ADD, sdl, FA.getValueType(),
>                           FA, Offset));
>  return nullptr;

And so sure enough the argument is an offset as used by this code.

And what I call the frame depth is plugged in as 0 here via "DAG.getConstant(0, 
sdl, TLI.getPointerTy(DAG.getDataLayout()))". The offset is applied after 
getting the frame address.

So I get to revert my change and try again changing the above call to use a 1 
instead.

It does not look like this changes the time frames in my history notes: it has 
been using frame depth zero since V2.7 when "case 
Builtin::BI__builtin_dwarf_cfa" appeared.

In general my overall questions about the target triple controlling which value 
to use (in DAG.getConstant hrere) still apply: It is not obvious that something 
that has been using frame depth 0 since V2.7 can be immediately changed to 
frame depth 1 for all contexts.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-28, at 8:49 PM, Mark Millard <markmi at dsl-only.net> wrote:

Here is what the "ABI for the ARM 32 32-bit Architecture" "DWARF for the ARM 
Architecture" document says about the CFA:

> 3.4 Canonical Frame Address
> 
> The term Canonical Frame Address (CFA) is defined in [GDWARF], §6.4, Call 
> Frame Information. This ABI adopts the typical definition of CFA given there.
>  The CFA is the value of the stack pointer (r13) at the call site in the 
> previous frame.

This, with the armv6 code I've shown via "objdump -d", indicates that for armv6 
clang++'s __builtin_dwarf_cfa() return value is not the same value as the 
official ARM ABI indicates. It also indicates that what g++ returns does match 
the official ARM ABI.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-28, at 5:40 PM, Mark Millard <markmi at dsl-only.net> wrote:

Looking some at clang/llvm history shows releases/branches:

V2.6 did not have "case Builtin::BI__builtin_dwarf_cfa".
V2.7 did have "case Builtin::BI__builtin_dwarf_cfa" but 
PPCTargetLowering::LowerFRAMEADDR ignored the argument.
V2.8 had PPCTargetLowering::LowerFRAMEADDR using its argument (as a frame 
depth, not a byte offset).

The apparently incorrect (not matching g++ frame depth returned) comments, 
naming, and value (when viewed as a frame depth) for "case 
Builtin::BI__builtin_dwarf_cfa" started in V2.7 and continues to this day.

That is a lot of time for various dependencies on the clang (mis)definition to 
accumulate across everything that uses clang.

It may be that limiting any change to specific TARGET_ARCH's for FreeBSD is 
appropriate. FreeBSD would likely need to list the appropriate TARGET_ARCH's, 
likely including powerpc and powerpc64 since clang before 3.8.0 was not in use 
for buildworld for powerpc/powerpc64.

Still this may have consequences for ports that use clang and might reference 
clang-compiled __builtin_dwarf_cfa() use, possibly from a lang/clang* instead 
of the system clang. My guess is that the interoperability with being able to 
use g++ vintages as well may lead to (modern?) lang/clang*'s tracking the fix 
for FreeBSD TARGET_ARCH's that are fixed.

I can ignore all this and build a system based on using 1 as the frame depth 
just to test, just as a matter of proof of concept for powerpc. (Powerpc64 hits 
a system libgcc_s defect and so needs more before C++ exceptions can be tested 
overall.)

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-28, at 2:20 PM, Mark Millard <markmi at dsl-only.net> wrote:

In /usr/src/contrib/llvm/tools/clang/lib/CodeGen/CGBuiltin.cpp there is:

> case Builtin::BI__builtin_dwarf_cfa: {
> // The offset in bytes from the first argument to the CFA.
> //
> // Why on earth is this in the frontend?  Is there any reason at
> // all that the backend can't reasonably determine this while
> // lowering llvm.eh.dwarf.cfa()?
> //
> // TODO: If there's a satisfactory reason, add a target hook for
> // this instead of hard-coding 0, which is correct for most targets.
> int32_t Offset = 0;
> 
> Value *F = CGM.getIntrinsic(Intrinsic::eh_dwarf_cfa);
> return RValue::get(Builder.CreateCall(F,
>                                 llvm::ConstantInt::get(Int32Ty, Offset)));
> }

I would have guessed that the internal argument was how many frames away on the 
stack to go from what 0 produces (high er address direction). g++'s 
__builtin_dwarf_cfa() returns the address for the next frame compared to clang 
3.8.0 (higher address direction).

I'd call that more of a frame depth than an offset. .eh_frame and its cfa 
material use offset terminology as byte offsets. And the comments above talk of 
an offset in bytes --but "next frame" distances in bytes would not be constant.

Looking at a use of LowerFRAMEADDR in a LowerRETURNADDR, for example,

> SDValue ARMTargetLowering::LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) 
> const{
> . . .
> EVT VT = Op.getValueType();
> SDLoc dl(Op);
> unsigned Depth = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
> if (Depth) {
> SDValue FrameAddr = LowerFRAMEADDR(Op, DAG);
> SDValue Offset = DAG.getConstant(4, dl, MVT::i32);
> return DAG.getLoad(VT, dl, DAG.getEntryNode(),
>                  DAG.getNode(ISD::ADD, dl, VT, FrameAddr, Offset),
>                  MachinePointerInfo(), false, false, false, 0);
> }
> . . .
> }

(PPCTargetLowering::LowerRETURNADDR is similar.) 

This has a mix of Depth and Offset overall, with the depth going to 
LowerFRAMEADDR via Op but Offset used later in GAG.getLoad via adding to the 
FrameAddr.

This would lead me to guess that the terminology and comments in "case 
Builtin::BI__builtin_dwarf_cfa" are wrong and that the Builder.CreateCall has 
been given a frame depth, not an offset.

> SDValue PPCTargetLowering::LowerFRAMEADDR(SDValue Op,
>                                     SelectionDAG &DAG) const {
> SDLoc dl(Op);
> unsigned Depth = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
> 
> MachineFunction &MF = DAG.getMachineFunction();
> MachineFrameInfo *MFI = MF.getFrameInfo();
> MFI->setFrameAddressIsTaken(true);
> 
> EVT PtrVT = DAG.getTargetLoweringInfo().getPointerTy(MF.getDataLayout());
> bool isPPC64 = PtrVT == MVT::i64;
> 
> // Naked functions never have a frame pointer, and so we use r1. For all
> // other functions, this decision must be delayed until during PEI.
> unsigned FrameReg;
> if (MF.getFunction()->hasFnAttribute(Attribute::Naked))
> FrameReg = isPPC64 ? PPC::X1 : PPC::R1;
> else
> FrameReg = isPPC64 ? PPC::FP8 : PPC::FP;
> 
> SDValue FrameAddr = DAG.getCopyFromReg(DAG.getEntryNode(), dl, FrameReg,
>                                    PtrVT);
> while (Depth--)
> FrameAddr = DAG.getLoad(Op.getValueType(), dl, DAG.getEntryNode(),
>                       FrameAddr, MachinePointerInfo(), false, false,
>                       false, 0);
> return FrameAddr;
> } 

Again Op is called Depth --and is used to get from one frame pointer value to 
the next: a frame depth.

To match g++ 4.2.1 the value to use is 1 for depth.

Overall, at least applied to powerpc/powerpc64:

> . . .

> // TODO: If there's a satisfactory reason, add a target hook for
> // this instead of hard-coding 0, which is correct for most targets.
> int32_t Offset = 0;

I think the comments in this area are actually talking about byte offsets, not 
depths and are just wrong. A byte offset of 0 would make sense relative to 
hardcoding but the value is actually a frame depth --a very different context.

I think that the naming of the variable is just wrong: it should be called 
Depth.

And I think that the comments should have originally talked about using a hard 
coded Depth 1 to match g++ and preexisting library usage of 
__builtin_dwarf_cfa() for C++ and other exception handling (.eh_frame usage). 
ANd the code should avhe matched.

As far as I can tell this error in the "case Builtin::BI__builtin_dwarf_cfa:" 
design was not caught until now.

But since the mess has been around a longtime just switching everything to 
match the g++ context now likely has its own problems. (Not just a FreeBSD 
issue.)

For FreeBSD I expect that Depth 1 could be used for powerpc and powerpc64: if 
it has been wrong for a long time (not just 3.8.0) then powerpc/powerpc64 
FreeBSD has likely been broken for C++ exception handling when buildworld was 
via clang for just as long. (But only recently has clang gotten this near 
working for buildworld for at least one of powerpc/powerpc64. Currently powerpc 
is closer, given that powerpc64 does not support softfloat last I knew.)

For other TARGET_ARCH's:

For FreeBSD armv6 it is less clear to me: it is based on clang as it is and I 
do not know what C++ exception ABI it uses. If a modern gcc/g++ buildworld had 
problems with C++ exception handling, does anything need to be done about it? 
For FreeBSD armv6 and the like: is xtoolchain like support important?

FreeBSD may have similar questions for other TARGET_ARCH's.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-28, at 2:46 AM, Mark Millard <markmi at dsl-only.net> wrote:

On 2016-Feb-28, at 12:39 AM, Roman Divacky <rdivacky at vlakno.cz> wrote:
> 
> Mark,
> 
> __builtin_dwarf_cfa() is lowered in clang to llvm intrinsic eh_dwarf_cfa.
> There's a depth argument (which defaults to 0, saying it's correct for most
> targets). 
> 
> Then the intrinsic gets lowered in SelectionDAG using
> PPCTargetLowering::LowerFRAMEADDR()
> 
> 
> Can you check that 1) the depth should be 0 for ppc64/ppc32 2) that
> LowerFRAMEADDR() does something sensible?
> 
> There's a loop depth-times, so I wonder if that makes a difference.
> 
> Thanks, Roman

"Lowered"? I'm not familiar with the clang code base or its terminology. Handed 
off to a lower level interface, may be?

As near as I can tell libgcc_s could be made to deal with clang 3.8.0's way of 
__builtin_dwarf_cfa() working for powerpc/powerpc64. But then use with g++ 
would be broken on powerpc/powerpc64 unless there were some sort of live "which 
compiler's type of code" test also involved.

Having only one libgcc_s and multiple compilers using it for a given 
TARGET_ARCH= (for example, devel/powerpc64-xtoolchain-gcc like uses) suggests 
sticking to one convention per TARGET_ARCH= for __builtin_dwarf_cfa().

I would guess that g++ conventions win in this type of context for FreeBSD, 
under the guideline of trying to be gcc 4.2.1 "ABI" compatible. libgcc_s from 
FreeBSD works for C++ exceptions with its gcc 4.2.1 for powerpc and powerpc64 
as things are as far as I know.

But for clang++ FreeBSD is one context among many and other libraries may be 
based on clang 3.8.0's existing interpretation, without gcc/g++ compatibility 
constraints. (I've no experience with earlier clang vintages for the issue.) It 
may be that any change needs to be FreeBSD target-triple specific for all I 
know. In essence: making the convention part of the ABI chosen.

I'll probably get some sleep before looking much at the code that you 
reference. A quick look at part of it suggests a fair amount of research/study 
for me to interpret things reliably.

The loop may be somewhat analogous to _Unwind_RaiseException's loop, but for a 
specific depth. I would currently guess that depth 1 would match gcc 4.2.1's 
result for __builtin_dwarf_cfa().

But there was also some other "address"(?) builtin support routine nearby that 
seemed to call into LowerFRAMEADDR() and I've no clue if g++ 4.2.1 uses the 
same depth-figure standard for both sorts of things or not. For all I know both 
types of builtins(?) might have mismatches with gcc/g++ 4.2.1 and both might 
need fixes.

I do vaguely remember seeing a builtin in FreeBSD code for something that had 
an explicit number in the argument list, possibly 
__builtin_frame_address(n)(?). But I only saw __builtin_dwarf_cfa() with no 
argument in the argument list as far as I remember.

If clang 3.8.0 and gcc 4.2.1 disagreed about what the numbering standard 
referred to for __builtin_frame_address(n) (or whatever it was), that would not 
be good and compatibility would imply conversions between the conventions for 
the 2 FreeBSD contexts.

I have not checked for armv6 related clang++ vs. g++ compatibility for C++ 
exception-handling. If anything is not operating for that context I expect that 
it would be g++ that generated buildworld code that did not work based on the 
FreeBSD source code: clang/clang++ is the normal compiler and kyua seemed to 
operate, unlike on the powerpc/powerpc64.

I've never tried to build armv6 via an equivalent of devel/powerpc64-gcc. I do 
not know if armv6 even uses the same sort of C++ exception-handling ABI or not. 
But I do know that __builtin_dwarf_cfa() is not compatible between clang++ and 
g++ from the 2-line source and comparing objdump -d results.

So more than powerpc/powerpc64 might be involved overall.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"

Re: clang 3.8.0 can mess up __builtin_dwarf_cfa (), at least for TARGET_ARCH=armv6, powerpc, powerpc64: a bug 207325 update

Reply via email to