Re: clang 3.8.0 can mess up __builtin_dwarf_cfa (), at least for TARGET_ARCH=armv6, powerpc, powerpc64: a bug 207325 update

Mark Millard Mon, 29 Feb 2016 03:21:13 -0800

TARGET_ARCH=powerpc: Using Frame Depth 1 in "case Intrinsic::eh_dwarf_cfa" (and 
Offset 0 in "case Builtin::BI__builtin_dwarf_cfa") for 
PPCTargetLowering::LowerFRAMEADDR related use has allowed getting into 
_Unwind_RaiseException_Phase2 and __cxxabiv1::__gxx_personality_v0. The example 
is the 8 line example compiled under g++ 4.2.1 but then used under a buildworld 
that was built with clang 3.8.0:

# ldd exception_test.g++421.powerpc 
exception_test.g++421.powerpc:
        libstdc++.so.6 => /usr/local/lib/gcc49/libstdc++.so.6 (0x41840000)
        libm.so.5 => /lib/libm.so.5 (0x4196a000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x419a1000)
        libc.so.7 => /lib/libc.so.7 (0x419c0000)

_Unwind_RaiseException_Phase2 is well past the point of the failure and crash 
from having Frame Depth 0 instead.

It is getting a SEGV during the _Unwind_SetGR called via:

710       /* For targets with pointers smaller than the word size, we must 
extend the
711          pointer, and this extension is target dependent.  */
712       _Unwind_SetGR (context, __builtin_eh_return_data_regno (0),
713                      __builtin_extend_pointer (ue_header));

for:

_Unwind_SetGR (context=0xffffd570, index=3, val=1105272896) at 
/usr/src/gnu/lib/libgcc/../../../contrib/gcc/unwind-dw2.c:207

context->reg[3] is 0x0 and so its use in the following gets the SEGV.

217       ptr = context->reg[index];
218     
219       if (size == sizeof(_Unwind_Ptr))
220         * (_Unwind_Ptr *) ptr = val;

I'm not going to try to analyze the source of this before getting some sleep.

For the 8 line program being compiled by clang++ 3.8.0 instead the results are 
different than the above and than the original behavior: The program does not 
crash abnormally but also does not find the catch clause that it should. The 
std::terminate gets its normal SIGABRT instead of an earlier SEGV.

Again I'm not going to try to analyze the details before getting some sleep.

But I will mention that I've also already submitted a report that libgcc_s does 
not completely implement DW_CFA_remember_state and DW_CFA_restore_state and 
that the code generated on powerpc64 touches the defect and so ends up with 
misbehavior. These might be similar.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-28, at 10:13 PM, Mark Millard <markmi at dsl-only.net> wrote:

Back to the "case Builtin::BI__builtin_dwarf_cfa" and 
"PPCTargetLowering::LowerFRAMEADDR" context:

I made the wrong change and need to retry.

The detail. . .

Passing a 1 through instead of zero did not do what I expected to the code 
generated. Instead it added one instruction:

addi    r3,r3,1

resulting in (objdump -d --prefix-addresses on the .o):

> Disassembly of section .text:
> 00000000 <_Z1fv> mflr    r0
> 00000004 <_Z1fv+0x4> stw     r31,-4(r1)
> 00000008 <_Z1fv+0x8> stw     r0,4(r1)
> 0000000c <_Z1fv+0xc> stwu    r1,-16(r1)
> 00000010 <_Z1fv+0x10> mr      r31,r1
> 00000014 <_Z1fv+0x14> mr      r3,r31
> 00000018 <_Z1fv+0x18> addi    r3,r3,1
> 0000001c <_Z1fv+0x1c> bl      0000001c <_Z1fv+0x1c>
> 00000020 <_Z1fv+0x20> addi    r1,r1,16
> 00000024 <_Z1fv+0x24> lwz     r0,4(r1)
> 00000028 <_Z1fv+0x28> lwz     r31,-4(r1)
> 0000002c <_Z1fv+0x2c> mtlr    r0
> 00000030 <_Z1fv+0x30> blr

In other words: it added the 1 as a byte offset like the comments that I 
thought were wrong said.

Since it does not appear that PPCTargetLowering::LowerFRAMEADDR would do that 
with a 1 I conclude that PPCTargetLowering::LowerFRAMEADDR is not involved with 
that figure.

So looking around. . .

/usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp has:

>  case Intrinsic::eh_dwarf_cfa: {
>    SDValue CfaArg = DAG.getSExtOrTrunc(getValue(I.getArgOperand(0)), sdl,
>                                        TLI.getPointerTy(DAG.getDataLayout()));
>    SDValue Offset = DAG.getNode(ISD::ADD, sdl,
>                                 CfaArg.getValueType(),
>                                 DAG.getNode(ISD::FRAME_TO_ARGS_OFFSET, sdl,
>                                             CfaArg.getValueType()),
>                                 CfaArg);
>    SDValue FA = DAG.getNode(
>        ISD::FRAMEADDR, sdl, TLI.getPointerTy(DAG.getDataLayout()),
>        DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout())));
>    setValue(&I, DAG.getNode(ISD::ADD, sdl, FA.getValueType(),
>                             FA, Offset));
>    return nullptr;

And so sure enough the argument is an offset as used by this code.

And what I call the frame depth is plugged in as 0 here via "DAG.getConstant(0, 
sdl, TLI.getPointerTy(DAG.getDataLayout()))". The offset is applied after 
getting the frame address.

So I get to revert my change and try again changing the above call to use a 1 
instead.

It does not look like this changes the time frames in my history notes: it has 
been using frame depth zero since V2.7 when "case 
Builtin::BI__builtin_dwarf_cfa" appeared.

In general my overall questions about the target triple controlling which value 
to use (in DAG.getConstant hrere) still apply: It is not obvious that something 
that has been using frame depth 0 since V2.7 can be immediately changed to 
frame depth 1 for all contexts.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-28, at 8:49 PM, Mark Millard <markmi at dsl-only.net> wrote:

Here is what the "ABI for the ARM 32 32-bit Architecture" "DWARF for the ARM 
Architecture" document says about the CFA:

> 3.4 Canonical Frame Address
> 
> The term Canonical Frame Address (CFA) is defined in [GDWARF], §6.4, Call 
> Frame Information. This ABI adopts the typical definition of CFA given there.
>  The CFA is the value of the stack pointer (r13) at the call site in the 
> previous frame.

This, with the armv6 code I've shown via "objdump -d", indicates that for armv6 
clang++'s __builtin_dwarf_cfa() return value is not the same value as the 
official ARM ABI indicates. It also indicates that what g++ returns does match 
the official ARM ABI.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-28, at 5:40 PM, Mark Millard <markmi at dsl-only.net> wrote:

Looking some at clang/llvm history shows releases/branches:

V2.6 did not have "case Builtin::BI__builtin_dwarf_cfa".
V2.7 did have "case Builtin::BI__builtin_dwarf_cfa" but 
PPCTargetLowering::LowerFRAMEADDR ignored the argument.
V2.8 had PPCTargetLowering::LowerFRAMEADDR using its argument (as a frame 
depth, not a byte offset).

The apparently incorrect (not matching g++ frame depth returned) comments, 
naming, and value (when viewed as a frame depth) for "case 
Builtin::BI__builtin_dwarf_cfa" started in V2.7 and continues to this day.

That is a lot of time for various dependencies on the clang (mis)definition to 
accumulate across everything that uses clang.

It may be that limiting any change to specific TARGET_ARCH's for FreeBSD is 
appropriate. FreeBSD would likely need to list the appropriate TARGET_ARCH's, 
likely including powerpc and powerpc64 since clang before 3.8.0 was not in use 
for buildworld for powerpc/powerpc64.

Still this may have consequences for ports that use clang and might reference 
clang-compiled __builtin_dwarf_cfa() use, possibly from a lang/clang* instead 
of the system clang. My guess is that the interoperability with being able to 
use g++ vintages as well may lead to (modern?) lang/clang*'s tracking the fix 
for FreeBSD TARGET_ARCH's that are fixed.

I can ignore all this and build a system based on using 1 as the frame depth 
just to test, just as a matter of proof of concept for powerpc. (Powerpc64 hits 
a system libgcc_s defect and so needs more before C++ exceptions can be tested 
overall.)

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-28, at 2:20 PM, Mark Millard <markmi at dsl-only.net> wrote:

In /usr/src/contrib/llvm/tools/clang/lib/CodeGen/CGBuiltin.cpp there is:

> case Builtin::BI__builtin_dwarf_cfa: {
> // The offset in bytes from the first argument to the CFA.
> //
> // Why on earth is this in the frontend?  Is there any reason at
> // all that the backend can't reasonably determine this while
> // lowering llvm.eh.dwarf.cfa()?
> //
> // TODO: If there's a satisfactory reason, add a target hook for
> // this instead of hard-coding 0, which is correct for most targets.
> int32_t Offset = 0;
> 
> Value *F = CGM.getIntrinsic(Intrinsic::eh_dwarf_cfa);
> return RValue::get(Builder.CreateCall(F,
>                                   llvm::ConstantInt::get(Int32Ty, Offset)));
> }

I would have guessed that the internal argument was how many frames away on the 
stack to go from what 0 produces (high er address direction). g++'s 
__builtin_dwarf_cfa() returns the address for the next frame compared to clang 
3.8.0 (higher address direction).

I'd call that more of a frame depth than an offset. .eh_frame and its cfa 
material use offset terminology as byte offsets. And the comments above talk of 
an offset in bytes --but "next frame" distances in bytes would not be constant.

Looking at a use of LowerFRAMEADDR in a LowerRETURNADDR, for example,

> SDValue ARMTargetLowering::LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) 
> const{
> . . .
> EVT VT = Op.getValueType();
> SDLoc dl(Op);
> unsigned Depth = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
> if (Depth) {
> SDValue FrameAddr = LowerFRAMEADDR(Op, DAG);
> SDValue Offset = DAG.getConstant(4, dl, MVT::i32);
> return DAG.getLoad(VT, dl, DAG.getEntryNode(),
>                    DAG.getNode(ISD::ADD, dl, VT, FrameAddr, Offset),
>                    MachinePointerInfo(), false, false, false, 0);
> }
> . . .
> }

(PPCTargetLowering::LowerRETURNADDR is similar.) 

This has a mix of Depth and Offset overall, with the depth going to 
LowerFRAMEADDR via Op but Offset used later in GAG.getLoad via adding to the 
FrameAddr.

This would lead me to guess that the terminology and comments in "case 
Builtin::BI__builtin_dwarf_cfa" are wrong and that the Builder.CreateCall has 
been given a frame depth, not an offset.

> SDValue PPCTargetLowering::LowerFRAMEADDR(SDValue Op,
>                                       SelectionDAG &DAG) const {
> SDLoc dl(Op);
> unsigned Depth = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
> 
> MachineFunction &MF = DAG.getMachineFunction();
> MachineFrameInfo *MFI = MF.getFrameInfo();
> MFI->setFrameAddressIsTaken(true);
> 
> EVT PtrVT = DAG.getTargetLoweringInfo().getPointerTy(MF.getDataLayout());
> bool isPPC64 = PtrVT == MVT::i64;
> 
> // Naked functions never have a frame pointer, and so we use r1. For all
> // other functions, this decision must be delayed until during PEI.
> unsigned FrameReg;
> if (MF.getFunction()->hasFnAttribute(Attribute::Naked))
> FrameReg = isPPC64 ? PPC::X1 : PPC::R1;
> else
> FrameReg = isPPC64 ? PPC::FP8 : PPC::FP;
> 
> SDValue FrameAddr = DAG.getCopyFromReg(DAG.getEntryNode(), dl, FrameReg,
>                                      PtrVT);
> while (Depth--)
> FrameAddr = DAG.getLoad(Op.getValueType(), dl, DAG.getEntryNode(),
>                         FrameAddr, MachinePointerInfo(), false, false,
>                         false, 0);
> return FrameAddr;
> } 

Again Op is called Depth --and is used to get from one frame pointer value to 
the next: a frame depth.

To match g++ 4.2.1 the value to use is 1 for depth.

Overall, at least applied to powerpc/powerpc64:

> . . .

> // TODO: If there's a satisfactory reason, add a target hook for
> // this instead of hard-coding 0, which is correct for most targets.
> int32_t Offset = 0;

I think the comments in this area are actually talking about byte offsets, not 
depths and are just wrong. A byte offset of 0 would make sense relative to 
hardcoding but the value is actually a frame depth --a very different context.

I think that the naming of the variable is just wrong: it should be called 
Depth.

And I think that the comments should have originally talked about using a hard 
coded Depth 1 to match g++ and preexisting library usage of 
__builtin_dwarf_cfa() for C++ and other exception handling (.eh_frame usage). 
ANd the code should avhe matched.

As far as I can tell this error in the "case Builtin::BI__builtin_dwarf_cfa:" 
design was not caught until now.

But since the mess has been around a longtime just switching everything to 
match the g++ context now likely has its own problems. (Not just a FreeBSD 
issue.)

For FreeBSD I expect that Depth 1 could be used for powerpc and powerpc64: if 
it has been wrong for a long time (not just 3.8.0) then powerpc/powerpc64 
FreeBSD has likely been broken for C++ exception handling when buildworld was 
via clang for just as long. (But only recently has clang gotten this near 
working for buildworld for at least one of powerpc/powerpc64. Currently powerpc 
is closer, given that powerpc64 does not support softfloat last I knew.)

For other TARGET_ARCH's:

For FreeBSD armv6 it is less clear to me: it is based on clang as it is and I 
do not know what C++ exception ABI it uses. If a modern gcc/g++ buildworld had 
problems with C++ exception handling, does anything need to be done about it? 
For FreeBSD armv6 and the like: is xtoolchain like support important?

FreeBSD may have similar questions for other TARGET_ARCH's.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-28, at 2:46 AM, Mark Millard <markmi at dsl-only.net> wrote:

On 2016-Feb-28, at 12:39 AM, Roman Divacky <rdivacky at vlakno.cz> wrote:
> 
> Mark,
> 
> __builtin_dwarf_cfa() is lowered in clang to llvm intrinsic eh_dwarf_cfa.
> There's a depth argument (which defaults to 0, saying it's correct for most
> targets). 
> 
> Then the intrinsic gets lowered in SelectionDAG using
> PPCTargetLowering::LowerFRAMEADDR()
> 
> 
> Can you check that 1) the depth should be 0 for ppc64/ppc32 2) that
> LowerFRAMEADDR() does something sensible?
> 
> There's a loop depth-times, so I wonder if that makes a difference.
> 
> Thanks, Roman

"Lowered"? I'm not familiar with the clang code base or its terminology. Handed 
off to a lower level interface, may be?

As near as I can tell libgcc_s could be made to deal with clang 3.8.0's way of 
__builtin_dwarf_cfa() working for powerpc/powerpc64. But then use with g++ 
would be broken on powerpc/powerpc64 unless there were some sort of live "which 
compiler's type of code" test also involved.

Having only one libgcc_s and multiple compilers using it for a given 
TARGET_ARCH= (for example, devel/powerpc64-xtoolchain-gcc like uses) suggests 
sticking to one convention per TARGET_ARCH= for __builtin_dwarf_cfa().

I would guess that g++ conventions win in this type of context for FreeBSD, 
under the guideline of trying to be gcc 4.2.1 "ABI" compatible. libgcc_s from 
FreeBSD works for C++ exceptions with its gcc 4.2.1 for powerpc and powerpc64 
as things are as far as I know.

But for clang++ FreeBSD is one context among many and other libraries may be 
based on clang 3.8.0's existing interpretation, without gcc/g++ compatibility 
constraints. (I've no experience with earlier clang vintages for the issue.) It 
may be that any change needs to be FreeBSD target-triple specific for all I 
know. In essence: making the convention part of the ABI chosen.

I'll probably get some sleep before looking much at the code that you 
reference. A quick look at part of it suggests a fair amount of research/study 
for me to interpret things reliably.

The loop may be somewhat analogous to _Unwind_RaiseException's loop, but for a 
specific depth. I would currently guess that depth 1 would match gcc 4.2.1's 
result for __builtin_dwarf_cfa().

But there was also some other "address"(?) builtin support routine nearby that 
seemed to call into LowerFRAMEADDR() and I've no clue if g++ 4.2.1 uses the 
same depth-figure standard for both sorts of things or not. For all I know both 
types of builtins(?) might have mismatches with gcc/g++ 4.2.1 and both might 
need fixes.

I do vaguely remember seeing a builtin in FreeBSD code for something that had 
an explicit number in the argument list, possibly 
__builtin_frame_address(n)(?). But I only saw __builtin_dwarf_cfa() with no 
argument in the argument list as far as I remember.

If clang 3.8.0 and gcc 4.2.1 disagreed about what the numbering standard 
referred to for __builtin_frame_address(n) (or whatever it was), that would not 
be good and compatibility would imply conversions between the conventions for 
the 2 FreeBSD contexts.

I have not checked for armv6 related clang++ vs. g++ compatibility for C++ 
exception-handling. If anything is not operating for that context I expect that 
it would be g++ that generated buildworld code that did not work based on the 
FreeBSD source code: clang/clang++ is the normal compiler and kyua seemed to 
operate, unlike on the powerpc/powerpc64.

I've never tried to build armv6 via an equivalent of devel/powerpc64-gcc. I do 
not know if armv6 even uses the same sort of C++ exception-handling ABI or not. 
But I do know that __builtin_dwarf_cfa() is not compatible between clang++ and 
g++ from the 2-line source and comparing objdump -d results.

So more than powerpc/powerpc64 might be involved overall.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"

Re: clang 3.8.0 can mess up __builtin_dwarf_cfa (), at least for TARGET_ARCH=armv6, powerpc, powerpc64: a bug 207325 update

Reply via email to