Re: [edk2] edk2 llvm branch

Andrew Fish Mon, 30 May 2016 10:26:14 -0700

> On May 29, 2016, at 10:47 PM, Shi, Steven <steven....@intel.com> wrote:
> 
> Hi Andrew,
> 
> I think I root cause the issue that Clang LTO X64 OVMF hang in Sec. It is 
> related to the LLVM LTO has not supported the large code model yet which 
> cause X64 LTO code cannot be loaded to run at high address (larger than 2GB). 
> Please see the detail in below llvm thread discussion. Apple engineer (Mehdi) 
> says  ld64 on OS X does not support large code model in LTO either, which 
> means your Xcode LTO tool chain should have the same problem.
>


Steven,

We don't have any issues using Xcode. I think you are confused about needing 
the large code model. The small model means the program can't be larger than 
2GB. The small model does not restrict the address code can run at, all ll Mac 
OS X applications run at addresses > 4 GB for example. The SEC (like all EFI 
drivers/applications) is linked at 0x0 (or 0x240 for ELF and Mach-O to make 
space for the PE/COFF header). The build tools relocate the PE/COFF image to 
the addresses in the FV so they can execute in place. 

The small and large models are about PIC (Position Independent Code) and have 
to do with how big the offset is to the PC (%rip). The small model is a 32-bit 
offset so you can only go 2GB in any give direction, and that is what limits 
the size, but it makes each PC relative instruction smaller (saves 4 bytes). 

For example if you read a global like this the compiler will generate this 
code. 
int constant = 0;

int get_constant(void)
{
        return constant;
}

(lldb) dis -n get_constant -b 
a.out`get_constant:
a.out[0x100000f8c] <+0>:  55                           pushq  %rbp
a.out[0x100000f8d] <+1>:  48 89 e5                 movq   %rsp, %rbp
a.out[0x100000f90] <+4>:  8b 05 6a 00 00 00  movl   0x6a(%rip), %eax
a.out[0x100000f96] <+10>: 5d                          popq   %rbp
a.out[0x100000f97] <+11>: c3                          retq   

At link time the linker will figure out the offset (plus or minus) from the PC 
required to access the global. In the example above the data section follows 
the text section so the offset to the global is PC + 0x6a. The small model 
implies that the %rip relative move is a 32-bit operation and you can see the 
mov instruction is 6 bytes (2 op codes and a 32-bit offset). This code does not 
have any issue running near the X86 reset vector just under 4 GB. 

Your reported issue was a register-indirect absolute JMP instruction which is 
going to be the same in both models. I think the way this works is %rcx will be 
PIC (calculated relative to %rip) and the constant is an offset to the jmp 
table and the table index. 

 jmpq     *.LJTI3_0(,%rcx,8)
 jmpq     0xfffcdd54(,%rcx,8)

It seems like this code is saying go backwards 200K which seems broken in 
general (How big is your SEC?). So this is likely a code gen bug. 

This almost looks like a PE/COFF relocation was applied in error? Can you look 
at the SecMain.efi (the one linked at zero) and disassemble that instruction 
and see what value it contains. You can also dump the PE/COFF and see if that 
location contains a relocation. This looks like it may be a linker bug? 

Thanks,

Andrew Fish

> http://lists.llvm.org/pipermail/llvm-dev/2016-May/100235.html
> 
> 
> Could you do me a favor? I’m trying to persuade LLVM community to support the 
> large code model in LTO, because it is very important for Uefi firmware Clang 
> LTO enabling. I think Apple compiler team guys have big influence in LLVM 
> community, could you help me explain the Uefi firmware requirement to your 
> compiler team before they say no to me in the LLVM community. It seems not 
> very difficult to enable the large code model in LLVM LTO and “it is really a 
> trivial option to add” according to Mehdi’s input. Please help!
> 
> 
> Steven Shi
> Intel\SSG\STO\UEFI Firmware
> 
> Tel: +86 021-61166522
> iNet: 821-6522
> 
> From: af...@apple.com [mailto:af...@apple.com]
> Sent: Thursday, May 26, 2016 12:53 AM
> To: Shi, Steven <steven....@intel.com>
> Cc: Kinney, Michael D <michael.d.kin...@intel.com>; edk2-devel@lists.01.org; 
> Justen, Jordan L <jordan.l.jus...@intel.com>
> Subject: Re: [edk2] edk2 llvm branch
> 
> 
> On May 25, 2016, at 9:43 AM, Shi, Steven 
> <steven....@intel.com<mailto:steven....@intel.com>> wrote:
> 
> Hi Andrew,
> For the Clang LTO generate wrong code on Qemu X64 issue, I found it is 
> related to the high address (>2G) wrong sign-extend displacements to 64 bits. 
> E.g. I hope “jmp qword ptr 0xfffcdd54[0]” in 64 bits mode is to jump to the 
> address in [0x00000000fffcdd54+0], but in fact, clang lto generate 
> instruction to jump to address in [0xfffffffffffcdd54+0]. Qemu X64 is very 
> special that its SecMain run in 64 bits mode, and its execution address 
> happen to be 0x00000000fffcxxxx. If there is any instruction need indirect 
> address access, the clang lto code will wrong access very high 
> 0xfffffffffffcxxxx rang, which cause page fault error in Qemu X64 sec. This 
> is why this issue does not happen in IA32 tip, but only in X64 tip.
> 
> BTW, could you help to build the XCode LTO Qemu X64 image as below and send 
> it (edk2\Build\OvmfX64\DEBUG_CLANGLTO38\FV\OVMF.fd) to me to test whether the 
> XCode LTO Qemu X64 image really work?
> build -a X64 -t CLANGLTO38(replace it with your XCodeLTO tool chain) -p 
> OvmfPkg/OvmfPkgX64.dsc -n 5 -b DEBUG -DDEBUG_ON_SERIAL_PORT
> 
> Below is this issue detail info:
> You know, Clang LTO aggressively inline (or collapse) code together, and then 
> like to use many Jump far, absolute indirect, instructions, which is quite 
> different from Clang normal build. In the attachment, there is a LTO assembly 
> code example of MdePkg\Library\BasePrintLib\PrintLibInternal.c. You can 
> create it with below command:
> build -a X64 -t CLANGLTO38(replace it with your XCodeLTO tool chain) -p 
> OvmfPkg/OvmfPkgX64.dsc -n 5 -b DEBUG -DDEBUG_ON_SERIAL_PORT -m 
> MdePkg/Library/BasePrintLib/BasePrintLib.inf
> llc 
> Build/OvmfX64/DEBUG_CLANGLTO38/X64/MdePkg/Library/BasePrintLib/BasePrintLib/OUTPUT/PrintLibInternal.obj
>  -o PrintLibInternal.s
> 
> And in this PrintLibInternal.s, you can see the Clang LTO use two indirect 
> Jump far as below. These two indirect Jump far cause the Qemu X64 SecMain 
> page fault.
> # /home/jshi19/edk2-fork/MdePkg/Library/BasePrintLib/PrintLibInternal.c:446:9
>                ...
>                jmpq     *.LJTI3_0(,%rdx,8)
>                ...
> 
> # /home/jshi19/edk2-fork/MdePkg/Library/BasePrintLib/PrintLibInternal.c:528:7
>                ....
>                jmpq     *.LJTI3_1(,%rax,8)
>                ...
> 
> .LJTI3_0:
>                .quad    .LBB3_36
>                .quad    .LBB3_33
>                .quad    .LBB3_35
>                ... ...
> 
> .LJTI3_1:
>                .quad    .LBB3_56
>                .quad    .LBB3_26
>                .quad    .LBB3_89
>                ...
> 
> To more clearly show the page fault arch info, I leaverage Simics MinnowMax 
> to re-run the Qemu X64 SecMain LTO code to reproduce it. I add one more line 
> in line 528 to let simics stop just before the page fault point. You can see 
> the new PrintLibInternal.c and its LTO assembly code in attached 
> PrintLibInternal.c-Simics and PrintLibInternal.s-Simics. Below is the page 
> fault point:
> 
>                .loc         2 528 7 is_stmt 0       # 
> /home/jshi19/edk2-fork/MdePkg/Library/BasePrintLib/PrintLibInternal.c:528:7
>                callq       MAGIC_SHOW_VALUE
>                .loc         2 529 15 is_stmt 1      # 
> /home/jshi19/edk2-fork/MdePkg/Library/BasePrintLib/PrintLibInternal.c:529:15
>                movq    -416(%rbp), %rax
>                .loc         2 529 7 is_stmt 0       # 
> /home/jshi19/edk2-fork/MdePkg/Library/BasePrintLib/PrintLibInternal.c:529:7
>                cmpq     $87, %rax
>                jle           .LBB3_66
> # BB#72:                                #   in Loop: Header=BB3_23 Depth=1
>                leaq       -97(%rax), %rcx
>                cmpq     $23, %rcx
>                ja            .LBB3_73
> # BB#76:                                #   in Loop: Header=BB3_23 Depth=1
>                jmpq     *.LJTI3_0(,%rcx,8)
>                … …
> 
> In below Simics, you can clearly see the “jmp qword ptr 0xfffcdd54[rcx*8]” 
> instruction cause the processor access logic address 0xfffffffffffcdd54 (not 
> we hoped 0x00000000fffcdd54)
> 
> Steven,
> 
> Can you dump the bytes for the instruction and look it up in the Intel 
> manual. I did not think that a jmp does a sign extend? I usually dump the 
> code in lldb to get the byte, assembly, and source mixed together so I'm not 
> sure how to do it in your setup.
> 
> Thanks,
> 
> Andrew Fish
> 
> 
> which cause page fault exception, then the page fault trigger General 
> Protection (Exception 13) because there is not exception handler in Sec parse.
> <image002.png>
> The content in [0x00000000fffcdd54] is 0x00000000FFFCC8FA as below
> <image001.png>
> 
> There is no page table to map logic address 0xfffffffffffcdd54, so there is 
> no content in [0xfffffffffffcdd54] :
> <image003.png>
> 
> 
> 
> 
> Steven Shi
> Intel\SSG\STO\UEFI Firmware
> 
> Tel: +86 021-61166522
> iNet: 821-6522
> 
> 
> <PrintLibInternal.s><PrintLibInternal.s-Simics><PrintLibInternal.c-Simics>
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Re: [edk2] edk2 llvm branch

Reply via email to