> On Nov 3, 2014, at 12:24 PM, Scott Duplichan <sc...@notabs.org> wrote: > > Laszlo Ersek [mailto:ler...@redhat.com <mailto:ler...@redhat.com>] wrote: > > ]On 10/29/14 05:59, Scott Duplichan wrote: > ]> Optimization is not enabled for x64 builds using gcc 4.4-4.9. For IA32 > ]> builds, -Os (optimize for small code size) is used. Why is this? Apparently > ]> it is because variable argument list handling fails when gcc X64 > optimization > ]> is enabled. The solution is an improvement to the patch of SVN rev 10440: > ]> http://sourceforge.net/p/edk2/mailman/message/25121111/ > <http://sourceforge.net/p/edk2/mailman/message/25121111/> > ] > ]My reading of r10440 is different. As far as I understand, > ] > ] (gcc-4.4, X64, stdarg builtins) > ] > ]is simply a broken a combination, regardless of optimization. > > You are right about gcc X64 builds using the standard (native) > stdarg builtins. Without the original r10440 patch, a test using > Duet crashes early on. The exception handler dump has bogus values, > probably due to the same stdarg problem. > > My point was why can't -Os be used for the current gcc X64 build like > it is for the IA32 build? Maybe r10440 is not relevant enough to even > be mentioned. What I found is adding -Os to the X64 Duet project > causes the 'g' (GUID) format to malfunction. There may be other > formatting problems, but this one is most obvious in the log file: > > X64 Duet boot log from gcc X64 build (standard, correct): > WELCOME TO EFI WORLD! > InstallProtocolInterface: D2B2B828-0826-48A7-B3DF-983C006024F0 1FDF9D58 > HOBLIST address in DXE = 0x1F3DA018 > Memory Allocation 0x00000004 0x1FD69000 - 0x1FD88FFF > Memory Allocation 0x00000004 0x1F964000 - 0x1FD68FFF > Memory Allocation 0x00000003 0x1FDB9000 - 0x1FE0FFFF > FV Hob 0x1FE10000 - 0x1FFAFFFF > InstallProtocolInterface: D8117CFE-94A6-11D4-9A3A-0090273FC14D 1FDF2AC0 > InstallProtocolInterface: 8F644FA9-E850-4DB1-9CE2-0B44698E8DA4 1F3D6A30 > InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 1F3D7818 > > X64 Duet boot log from gcc X64 build (with -Os added): > WELCOME TO EFI WORLD! > InstallProtocolInterface: 00000000-30300000-332D3030-2D30303030 1FE6F8C0 > HOBLIST address in DXE = 0x1F461018 > Memory Allocation 0x00000004 0x1FDF0000 - 0x1FE0FFFF > Memory Allocation 0x00000004 0x1F9EB000 - 0x1FDEFFFF > Memory Allocation 0x00000003 0x1FE40000 - 0x1FE7FFFF > FV Hob 0x1FE80000 - 0x1FFAFFFF > InstallProtocolInterface: 1F9EAB08-46312000-342D3830-2D30303030 1FE69070 > InstallProtocolInterface: 00000001-30300200-332D3130-2D30303230 1F45DA30 > InstallProtocolInterface: 00000001-30308FA1-332D3130-2D31414630 1F45E818 > > If "-Os -mabi=ms" is used for the gcc X64 build, then the pre-r10440 > method (using the native stdarg builtins) works. But that is just hiding > the problem. > > The __builtin_ms_va_* macros for cross ABI use are not well documented > as far as I can find. File cross-stdarg.h is about it. But they have been > around for a long time, at least since gcc 4.4. >
It seems if you make EFI VA_LIST point to __buitin_ms_va__* then you need to decorate any function using VA_LIST with EFIAPI to make sure the code gen, calling, and va_list all match up? The EFI rules are documented here: http://msdn.microsoft.com/en-us/library/9b372w95.aspx From debugging problems like this in the past you can usually figure it out from the assembly. The EFI/VC++ rules are very simple, like passing parameters, and the marker is a pointer to a stack frame looking thing. The Unix version is much more complicated and the marker is sometimes a data structure. The rules in Unix for floating point are very complex so you tend to see more code overhead in the Unix flow. In the following example I compiled it 1st for Unix and then for EFI calling convention. Note the added complexity in the p() function assembly introduced by the more complex Unix rules. Also note that Unix passes 6 values in registers, and EFI/VC++ is just 4 registers, and the order of the registers are different. When we were adding the x86_64-pc-win32-macho target to clang we found a few places where the compiler emitted the wrong from of the var args. So we tracked it down by looking at the assembly, then we built a simple stand alone case to file a bug against the compiler. ~/work/Compiler>cat va.c #include <stdarg.h> int printf (const char *, ...); void p2 (int a, __builtin_va_list *valist) { int i; for (i=0; i <a; i++) { printf ("%d\n", __builtin_va_arg (*valist, int)); } } void p (int a, ...) { __builtin_va_list valist; __builtin_va_start (valist, a); p2 (a, &valist); __builtin_va_end (valist); } int main () { p (7 , 1, 2, 3, 4, 5, 6, 7); return 0; } ~/work/Compiler>clang -S -Os va.c ~/work/Compiler>cat va.S .section __TEXT,__text,regular,pure_instructions .globl _p2 _p2: ## @p2 .cfi_startproc ## BB#0: pushq %rbp Ltmp3: .cfi_def_cfa_offset 16 Ltmp4: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp5: .cfi_def_cfa_register %rbp pushq %r15 pushq %r14 pushq %rbx pushq %rax Ltmp6: .cfi_offset %rbx, -40 Ltmp7: .cfi_offset %r14, -32 Ltmp8: .cfi_offset %r15, -24 movq %rsi, %r14 movl %edi, %ebx testl %ebx, %ebx jle LBB0_6 ## BB#1: ## %.lr.ph leaq L_.str(%rip), %r15 LBB0_2: ## =>This Inner Loop Header: Depth=1 movslq (%r14), %rcx cmpq $40, %rcx ja LBB0_4 ## BB#3: ## in Loop: Header=BB0_2 Depth=1 movq %rcx, %rax addq 16(%r14), %rax leal 8(%rcx), %ecx movl %ecx, (%r14) jmp LBB0_5 LBB0_4: ## in Loop: Header=BB0_2 Depth=1 movq 8(%r14), %rax leaq 8(%rax), %rcx movq %rcx, 8(%r14) LBB0_5: ## in Loop: Header=BB0_2 Depth=1 movl (%rax), %esi xorl %eax, %eax movq %r15, %rdi callq _printf decl %ebx jne LBB0_2 LBB0_6: ## %._crit_edge addq $8, %rsp popq %rbx popq %r14 popq %r15 popq %rbp retq .cfi_endproc .globl _p _p: ## @p .cfi_startproc ## BB#0: pushq %rbp Ltmp12: .cfi_def_cfa_offset 16 Ltmp13: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp14: .cfi_def_cfa_register %rbp pushq %rbx subq $216, %rsp Ltmp15: .cfi_offset %rbx, -24 testb %al, %al je LBB1_2 ## BB#1: movaps %xmm0, -176(%rbp) movaps %xmm1, -160(%rbp) movaps %xmm2, -144(%rbp) movaps %xmm3, -128(%rbp) movaps %xmm4, -112(%rbp) movaps %xmm5, -96(%rbp) movaps %xmm6, -80(%rbp) movaps %xmm7, -64(%rbp) LBB1_2: movq %r9, -184(%rbp) movq %r8, -192(%rbp) movq %rcx, -200(%rbp) movq %rdx, -208(%rbp) movq %rsi, -216(%rbp) movq ___stack_chk_guard@GOTPCREL(%rip), %rbx movq (%rbx), %rax movq %rax, -16(%rbp) leaq -224(%rbp), %rax movq %rax, -32(%rbp) leaq 16(%rbp), %rax movq %rax, -40(%rbp) movl $48, -44(%rbp) movl $8, -48(%rbp) leaq -48(%rbp), %rsi callq _p2 movq (%rbx), %rax cmpq -16(%rbp), %rax jne LBB1_4 ## BB#3: addq $216, %rsp popq %rbx popq %rbp retq LBB1_4: callq ___stack_chk_fail .cfi_endproc .globl _main _main: ## @main .cfi_startproc ## BB#0: pushq %rbp Ltmp18: .cfi_def_cfa_offset 16 Ltmp19: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp20: .cfi_def_cfa_register %rbp subq $16, %rsp movl $2, %edx movl $3, %ecx xorl %eax, %eax movl $7, 8(%rsp) movl $6, (%rsp) movl $7, %edi movl $1, %esi movl $4, %r8d movl $5, %r9d callq _p xorl %eax, %eax addq $16, %rsp popq %rbp retq .cfi_endproc .section __TEXT,__cstring,cstring_literals L_.str: ## @.str .asciz "%d\n" .subsections_via_symbols ~/work/Compiler>clang -S -Os va.c -target x86_64-pc-win32-macho ~/work/Compiler>cat va.S .section __TEXT,__text,regular,pure_instructions .globl _p2 _p2: ## @p2 .cfi_startproc ## BB#0: pushq %rbp Ltmp3: .cfi_def_cfa_offset 16 Ltmp4: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp5: .cfi_def_cfa_register %rbp pushq %rsi pushq %rdi pushq %rbx subq $40, %rsp Ltmp6: .cfi_offset %rbx, -40 Ltmp7: .cfi_offset %rdi, -32 Ltmp8: .cfi_offset %rsi, -24 movq %rdx, %rsi movl %ecx, %edi testl %edi, %edi jle LBB0_3 ## BB#1: ## %.lr.ph leaq L_.str(%rip), %rbx LBB0_2: ## =>This Inner Loop Header: Depth=1 movq (%rsi), %rax leaq 8(%rax), %rcx movq %rcx, (%rsi) movl (%rax), %edx movq %rbx, %rcx callq _printf decl %edi jne LBB0_2 LBB0_3: ## %._crit_edge addq $40, %rsp popq %rbx popq %rdi popq %rsi popq %rbp retq .cfi_endproc .globl _p _p: ## @p .cfi_startproc ## BB#0: pushq %rbp Ltmp12: .cfi_def_cfa_offset 16 Ltmp13: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp14: .cfi_def_cfa_register %rbp pushq %rsi pushq %rdi subq $64, %rsp Ltmp15: .cfi_offset %rdi, -32 Ltmp16: .cfi_offset %rsi, -24 movl %ecx, %esi movq %r9, 40(%rbp) movq %r8, 32(%rbp) movq %rdx, 24(%rbp) leaq 24(%rbp), %rax movq %rax, -48(%rbp) testl %esi, %esi jle LBB1_3 ## BB#1: ## %.lr.ph.i leaq L_.str(%rip), %rdi LBB1_2: ## =>This Inner Loop Header: Depth=1 movq -48(%rbp), %rax leaq 8(%rax), %rcx movq %rcx, -48(%rbp) movl (%rax), %edx movq %rdi, %rcx callq _printf decl %esi jne LBB1_2 LBB1_3: ## %p2.exit addq $64, %rsp popq %rdi popq %rsi popq %rbp retq .cfi_endproc .globl _main _main: ## @main .cfi_startproc ## BB#0: pushq %rbp Ltmp19: .cfi_def_cfa_offset 16 Ltmp20: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp21: .cfi_def_cfa_register %rbp subq $64, %rsp movl $7, 56(%rsp) movl $6, 48(%rsp) movl $5, 40(%rsp) movl $4, 32(%rsp) movl $7, %ecx movl $1, %edx movl $2, %r8d movl $3, %r9d callq _p xorl %eax, %eax addq $64, %rsp popq %rbp retq .cfi_endproc .section __TEXT,__cstring,cstring_literals L_.str: ## @.str .asciz "%d\n" .subsections_via_symbols > Thanks, > scott >
------------------------------------------------------------------------------
_______________________________________________ edk2-devel mailing list edk2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/edk2-devel