Hi Andrew, Thank you! Your comments really help me!
> If you turn off LTO what does the code look like? Does it work? This looks > like a linker error, but it is hard to say if it is the compiler or linker at > fault. [Steven]: If I turn off LTO, everything worked very well, but maybe I was just lucky because ovmf doesn’t use high address except SEC phase. I think I made a mistake to remove the " -pie" linking option because it will generate ELF EM_X86_64 relocation which our current GenFw doesn't support to covert. See below build failure. At the beginning to enable the LTO, I just walked around this GenFw issue to save time, but I paid much more effort later… ☹ LTO aggressively "inline" different functions together and like to use many instructions with indirect addressing which expose this position independent issue. "GenFw" -e SEC -o /home/jshi19/edk2-fork/Build/OvmfX64/DEBUG_CLANGLTO38/X64/OvmfPkg/Sec/SecMain/DEBUG/SecMain.efi /home/jshi19/edk2-fork/Build/OvmfX64/DEBUG_CLANGLTO38/X64/OvmfPkg/Sec/SecMain/DEBUG/SecMain.dll GenFw: ERROR 3000: Invalid /home/jshi19/edk2-fork/Build/OvmfX64/DEBUG_CLANGLTO38/X64/OvmfPkg/Sec/SecMain/DEBUG/SecMain.dll unsupported ELF EM_X86_64 relocation 0x4. make: *** [/home/jshi19/edk2-fork/Build/OvmfX64/DEBUG_CLANGLTO38/X64/OvmfPkg/Sec/SecMain/DEBUG/SecMain.efi] Error 2 GenFw: ERROR 3000: Invalid /home/jshi19/edk2-fork/Build/OvmfX64/DEBUG_CLANGLTO38/X64/OvmfPkg/Sec/SecMain/DEBUG/SecMain.dll unsupported ELF EM_X86_64 relocation 0x4. GenFw: ERROR 3000: Invalid /home/jshi19/edk2-fork/Build/OvmfX64/DEBUG_CLANGLTO38/X64/OvmfPkg/Sec/SecMain/DEBUG/SecMain.dll unsupported ELF EM_X86_64 relocation 0x4. ... ... > I see from the llvm mail thread that you need to add the -pie linker flag > with the small model did that resolve your issue? [Steven]: I will fix GenFw issue to support the ELF EM_X86_64 relocation convention firstly. And will let you know if I have good result. I wish I’ve been very close to success. ☺ Steven Shi Intel\SSG\STO\UEFI Firmware Tel: +86 021-61166522 iNet: 821-6522 > -----Original Message----- > From: af...@apple.com [mailto:af...@apple.com] > Sent: Tuesday, May 31, 2016 10:54 PM > To: Shi, Steven <steven....@intel.com> > Cc: edk2-devel@lists.01.org > Subject: Re: [edk2] edk2 llvm branch > > > > On May 31, 2016, at 7:26 AM, Andrew Fish > > <af...@apple.com<mailto:af...@apple.com>> wrote: > > > >> > >> On May 31, 2016, at 1:01 AM, Shi, Steven > >> <steven....@intel.com<mailto:steven....@intel.com>> wrote: > >> > >> Hi Andrew, > >>> The ELF should be linked at zero so that seems like the bug? It looks like > your code is linking at 0x00400000. Is that breaking the code that is > converting to PE/COFF? > >> [Steven]: No in fact, the start address of ELF shared library does not > matter. The real running address will be fixed according to relocation section > info. OK, after I use the GCC same link script, my SecMain.dll is also linked > at > zero now. But the CPU exception failure is exactly same. I attached the > below new disassemble code and elf sections info file in this email. > >> readelf -a SecMain.dll > SecMain.dll.elf > >> objdump -dS SecMain.dll > SecMain.dll.s > >> > >> In the new SecMain.dll.s, you will see same two jump instructions which > are absolute indirect Qword, but given in 32bits RIP-relative addressing. The > two 32bits RIP-relative address (0x5ea, 0x7c4 in below instructions) are > relocation address in .text section and need to be fix when load to memory. > >> > >> 5e7: ff 24 c5 38 7d 00 00 jmpq *0x7d38(,%rax,8) ß > >> there is a > relocation address in 0x4006e2, [0x5ea] = 0x00007d38 > >> 7c1: ff 24 c5 c0 79 00 00 jmpq *0x79c0(,%rax,8)) ß > >> there is a > relocation address in 0x4004dd, [0x7c4] = 0x000079c0 > >> > >> In SecMain.dll.elf, you can clearly see the above two relocation address > type is R_X86_64_32S, which is 32bits signed address(rang is [-2GB, 2GB]), > not R_X86_64_64. > >> Relocation section '.rela.text' at offset 0x202b38 contains 78 entries: > >> Offset Info Type > >> Sym. Value > Sym. Name + Addend > >> 0000000005ea 00010000000b R_X86_64_32S > 0000000000000240 .text + 7af8 > >> 0000000007c4 00010000000b R_X86_64_32S 0000000000000240 .text > + 7780 > >> > >> > >> > >>> Yes as you can't allocate memory < 4GB in an x86_64 Mac OS X App. > >>> My example command line app shows the default is > 4GB from the > thread. > >>> (lldb) dis -n get_constant -b > >>> a.out`get_constant: > >>> a.out[0x100000f8c] <+0>: 55 pushq %rbp > >>> a.out[0x100000f8d] <+1>: 48 89 e5 movq %rsp, %rbp > >>> a.out[0x100000f90] <+4>: 8b 05 6a 00 00 00 movl 0x6a(%rip), %eax > >>> a.out[0x100000f96] <+10>: 5d popq %rbp > >>> a.out[0x100000f97] <+11>: c3 retq > >> > >> [Steven]: This is really interesting. Let me ask the Mac OS > >> X App > default code model in LLVM community. > >> > >> Thank you! > >> > >> Steven Shi > >> Intel\SSG\STO\UEFI Firmware > >> > >> Tel: +86 021-61166522 > >> iNet: 821-6522 > >> > >> From: > >> af...@apple.com<mailto:af...@apple.com<mailto:af...@apple.com%3cmailto:af...@apple.com>> > [mailto:af...@apple.com] > >> Sent: Tuesday, May 31, 2016 11:10 AM > >> To: Shi, Steven > >> <steven....@intel.com<mailto:steven....@intel.com<mailto:steven....@intel.com%3cmailto:steven....@intel.com>>> > >> Cc: Kinney, Michael D > <michael.d.kin...@intel.com<mailto:michael.d.kin...@intel.com<mailto:michael.d.kin...@intel.com%3cmailto:michael.d.kin...@intel.com>>>; > Justen, > Jordan L > <jordan.l.jus...@intel.com<mailto:jordan.l.jus...@intel.com<mailto:jordan.l.jus...@intel.com%3cmailto:jordan.l.jus...@intel.com>>>; > edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>> > >> Subject: Re: [edk2] edk2 llvm branch > >> > >> > >> On May 30, 2016, at 6:32 PM, Shi, Steven > <steven....@intel.com<mailto:steven....@intel.com<mailto:steven....@intel.com%3cmailto:steven....@intel.com>>> > wrote: > >> > >> Hi Andrew, > >> > >>> jmpq *.LJTI3_0(,%rcx,8) > >>> jmpq 0xfffcdd54(,%rcx,8) > >>> > >>> It seems like this code is saying go backwards 200K which seems broken > in > >>> general (How big is your SEC?). So this is likely a code gen bug. > >> > >> [Steven]: OK, Let me explain the details about how the wrong relocation > of jmpq *.LJTI3_0(,%rcx,8) --> jmpq 0xfffcdd54(,%rcx,8) could happened > in small code model. > >> > >> Firstly, let’s get the SecMain.dll disassemble code and all elf sections > >> info > with below command. I’ve attached my info file in this email. > >> readelf -a SecMain.dll > SecMain.dll.elf > >> objdump -dS SecMain.dll > SecMain.dll.s > >> > >> In SecMain.dll.s, you will see the two jump instructions which are absolute > indirect Qword, but given in 32bits RIP-relative addressing. The two 32bits > RIP-relative address (0x4006e2, 0x4004dd in below instructions) are > relocation address in .text section and need to be fix when load to memory. > >> > >> 4006df: ff 24 c5 e0 1a 40 00 jmpq > >> *0x401ae0(,%rax,8) <-- > there is a relocation address in 0x4006e2, [0x4006e2] = 0x401ae0 > >> 4004da: ff 24 c5 58 1e 40 00 jmpq > >> *0x401e58(,%rax,8) <-- > there is a relocation address in 0x4004dd, [0x4004dd] = 0x401e58 > >> > >> In SecMain.dll.elf, you can clearly see the above two relocation address > type is R_X86_64_32S, which is 32bits signed address, not R_X86_64_64. > >> Relocation section '.rela.text' at offset 0x202b38 contains 78 entries: > >> Offset Info Type > >> Sym. Value > Sym. Name + Addend > >> 0000004006e2 00020000000b R_X86_64_32S > 0000000000401ae0 .rodata + 0 > >> 0000004004dd 00020000000b R_X86_64_32S > 0000000000401ae0 .rodata + 378 > >> > > > > Steven, > > > > For a 64-bit image you should never end up with R_X86_64_32S. Actually > PE/COFF does not have the concept of a signed relocation in an image. > > > > It is a little more clear in the PE/COFF specification but there are image > relocations and object relocations. The image relocations are the ones that > end up in the final linked image, and object relocations end up in object > files > and are used by the linker to build the final image. > > > > For example if you look in the ELF Specification (Sys V ABI for AMD64) you > will notice that there are PC relative relocations. A PC relative relocation > would never end up in an image, they just exist to help the linker. The > linker is > not going to know the final PC relative offset until all of the code is > constructed, so it is the linkers job to resolve these relocations. For > example > if you try to access a global that is defined in another module the compiler > does not know the final PC relative offset when the code is generated, thus > the linker has to fix that up as part of the final link. > > > > If you turn off LTO what does the code look like? Does it work? This looks > like a linker error, but it is hard to say if it is the compiler or linker at > fault. > > > > Writing a bug report against LLVM that requires the compiler writer to run > OVFM with a specific custom tool chain is probably never going to get fixed. > You are going to have to write some test code that reproduces the failure > and attach that example to the bug report. > > > > I see from the llvm mail thread that you need to add the -pie linker flag with > the small model did that resolve your issue? > > Thanks, > > Andrew Fish > > > Thanks, > > > > Andrew Fish > > > > > >> The ELF should be linked at zero so that seems like the bug? It looks like > your code is linking at 0x00400000. Is that breaking the code that is > converting to PE/COFF? > >> > >> > >> When our build GenFw tool fix the above two 32bits signed address with > Sec high address (e.g. 0xfffcdd54), it cause the two address overflow to > wrong negative values. > >> > >> Why the compiler use the R_X86_64_32S, not R_X86_64_64, as the > relocation address type in .text section? It is because the compiler assume > the code model is small. If we let the compiler know we need large code > model, the compiler should takes the "safe road" everywhere, using absolute > 64-bit moves to refer to symbols and generate correct relocation type. This > is why I need LLVM LTO support large code model. Please to ask apple > compiler team to help me. > >> > >> > >>> The small model does not restrict the address code can run at, all Mac > OS X applications run at addresses > 4 GB for example > >> [Steven]: are you sure the small code model Mac OS X > >> application > can run at address > 4GB ? > >> > >> > >> Yes as you can't allocate memory < 4GB in an x86_64 Mac OS X App. > >> > >> My example command line app shows the default is > 4GB from the > thread. > >> > >>> (lldb) dis -n get_constant -b > >>> a.out`get_constant: > >>> a.out[0x100000f8c] <+0>: 55 pushq %rbp > >>> a.out[0x100000f8d] <+1>: 48 89 e5 movq %rsp, %rbp > >>> a.out[0x100000f90] <+4>: 8b 05 6a 00 00 00 movl 0x6a(%rip), %eax > >>> a.out[0x100000f96] <+10>: 5d popq %rbp > >>> a.out[0x100000f97] <+11>: c3 retq > >> > >> Thanks, > >> > >> Andrew Fish > >> > >> > >> > >> Steven Shi > >> Intel\SSG\STO\UEFI Firmware > >> > >> Tel: +86 021-61166522 > >> iNet: 821-6522 > >> > >> > >>> -----Original Message----- > >>> From: > >>> af...@apple.com<mailto:af...@apple.com<mailto:af...@apple.com%3cmailto:af...@apple.com>> > [mailto:af...@apple.com] > >>> Sent: Tuesday, May 31, 2016 1:26 AM > >>> To: Shi, Steven > >>> <steven....@intel.com<mailto:steven....@intel.com<mailto:steven....@intel.com%3cmailto:steven....@intel.com>>> > >>> Cc: Kinney, Michael D > <michael.d.kin...@intel.com<mailto:michael.d.kin...@intel.com<mailto:michael.d.kin...@intel.com%3cmailto:michael.d.kin...@intel.com>>>; > Justen, > Jordan L > >>> <jordan.l.jus...@intel.com<mailto:jordan.l.jus...@intel.com<mailto:jordan.l.jus...@intel.com%3cmailto:jordan.l.jus...@intel.com>>>; > >>> edk2-<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org> > de...@lists.01.org<mailto:edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org%3cmailto:edk2-devel@lists.01.org>> > >>> Subject: Re: [edk2] edk2 llvm branch > >>> > >>> > >>>> On May 29, 2016, at 10:47 PM, Shi, Steven > <steven....@intel.com<mailto:steven....@intel.com<mailto:steven....@intel.com%3cmailto:steven....@intel.com>>> > wrote: > >>>> > >>>> Hi Andrew, > >>>> > >>>> I think I root cause the issue that Clang LTO X64 OVMF hang in Sec. It is > >>> related to the LLVM LTO has not supported the large code model yet > which > >>> cause X64 LTO code cannot be loaded to run at high address (larger than > >>> 2GB). Please see the detail in below llvm thread discussion. Apple > engineer > >>> (Mehdi) says ld64 on OS X does not support large code model in LTO > either, > >>> which means your Xcode LTO tool chain should have the same problem. > >>>> > >>> > >>> Steven, > >>> > >>> We don't have any issues using Xcode. I think you are confused about > >>> needing the large code model. The small model means the program can't > be > >>> larger than 2GB. The small model does not restrict the address code can > run > >>> at, all ll Mac OS X applications run at addresses > 4 GB for example. The > SEC > >>> (like all EFI drivers/applications) is linked at 0x0 (or 0x240 for ELF and > Mach-O > >>> to make space for the PE/COFF header). The build tools relocate the > PE/COFF > >>> image to the addresses in the FV so they can execute in place. > >>> > >>> The small and large models are about PIC (Position Independent Code) > and > >>> have to do with how big the offset is to the PC (%rip). The small model is > a > >>> 32-bit offset so you can only go 2GB in any give direction, and that is > what > >>> limits the size, but it makes each PC relative instruction smaller (saves > >>> 4 > >>> bytes). > >>> > >>> For example if you read a global like this the compiler will generate this > code. > >>> int constant = 0; > >>> > >>> int get_constant(void) > >>> { > >>> return constant; > >>> } > >>> > >>> (lldb) dis -n get_constant -b > >>> a.out`get_constant: > >>> a.out[0x100000f8c] <+0>: 55 pushq %rbp > >>> a.out[0x100000f8d] <+1>: 48 89 e5 movq %rsp, %rbp > >>> a.out[0x100000f90] <+4>: 8b 05 6a 00 00 00 movl 0x6a(%rip), %eax > >>> a.out[0x100000f96] <+10>: 5d popq %rbp > >>> a.out[0x100000f97] <+11>: c3 retq > >>> > >>> At link time the linker will figure out the offset (plus or minus) from > >>> the > PC > >>> required to access the global. In the example above the data section > follows > >>> the text section so the offset to the global is PC + 0x6a. The small model > >>> implies that the %rip relative move is a 32-bit operation and you can see > the > >>> mov instruction is 6 bytes (2 op codes and a 32-bit offset). This code > does not > >>> have any issue running near the X86 reset vector just under 4 GB. > >>> > >>> Your reported issue was a register-indirect absolute JMP instruction > which is > >>> going to be the same in both models. I think the way this works is %rcx > will > >>> be PIC (calculated relative to %rip) and the constant is an offset to the > jmp > >>> table and the table index. > >>> > >>> jmpq *.LJTI3_0(,%rcx,8) > >>> jmpq 0xfffcdd54(,%rcx,8) > >>> > >>> It seems like this code is saying go backwards 200K which seems broken > in > >>> general (How big is your SEC?). So this is likely a code gen bug. > >>> > >>> This almost looks like a PE/COFF relocation was applied in error? Can > you > >>> look at the SecMain.efi (the one linked at zero) and disassemble that > >>> instruction and see what value it contains. You can also dump the > PE/COFF > >>> and see if that location contains a relocation. This looks like it may be > >>> a > >>> linker bug? > >>> > >>> Thanks, > >>> > >>> Andrew Fish > >>> > >> <SecMain.dll.elf><SecMain.dll.s> > >> > >> _______________________________________________ > >> edk2-devel mailing list > >> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org> > >> https://lists.01.org/mailman/listinfo/edk2-devel > > > > _______________________________________________ > > edk2-devel mailing list > > edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org> > > https://lists.01.org/mailman/listinfo/edk2-devel _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel