I am calling a function in another x64 DLL with the following C signature: int napi_create_double(void*, double, void*);
The first time I call this function, the 'double' argument ends up as 1.20305e-307 inside napi_create_double, no matter what value the caller gives. The 'double' is corrupted. Calls after the first don't corrupt the 'double'. The cause is ntdll.dll, eventually called by MinGW's __delayLoadHelper2, modifying the xmm1 register: #0 0x00007ffd26ce3006 in ntdll!RtlLookupFunctionEntry () from C:\WINDOWS\SYSTEM32\ntdll.dll #1 0x00007ffd26ce05e8 in ntdll!LdrGetProcedureAddressForCaller () from C:\WINDOWS\SYSTEM32\ntdll.dll #2 0x00007ffd26ce00a5 in ntdll!LdrGetProcedureAddressForCaller () from C:\WINDOWS\SYSTEM32\ntdll.dll #3 0x00007ffd245b53dc in KERNELBASE!GetProcAddressForCaller () from C:\WINDOWS\System32\KernelBase.dll #4 0x00007ffcd7b7ca6f in __delayLoadHelper2 (pidd=0x7ffcd7b8ba70 <__DELAY_IMPORT_DESCRIPTOR_node_napi_lib>, ppfnIATEntry=0x7ffcd7ecd134 <__imp_napi_create_double>) at C:/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/misc/delayimp.c:209 #5 0x00007ffcd7b717c9 in __tailMerge_node_napi_lib () from MYDLL.dll #6 0x000002ad2fe84c50 in ?? () 0x00007ffd26ce2ffb <+1051>: movups (%rdx),%xmm0 0x00007ffd26ce2ffe <+1054>: movups %xmm0,(%rsi) 0x00007ffd26ce3001 <+1057>: movsd 0x10(%rdx),%xmm1 => 0x00007ffd26ce3006 <+1062>: movsd %xmm1,0x10(%rsi) 0x00007ffd26ce300b <+1067>: mov (%rsi),%rbp 0x00007ffd26ce300e <+1070>: mov %r11,%rax 0x00007ffd26ce3011 <+1073>: lock cmpxchg %r12,0x1384d6(%rip) # 0x7ffd26e1b4f0 0x00007ffd26ce301a <+1082>: jne 0x7ffd26ce3102 <ntdll!RtlLookupFunctionEntry+1314> According to Windows x64 documentation, xmm1 is a volatile register: https://docs.microsoft.com/en-us/cpp/build/x64-software-conventions?redirectedfrom=MSDN&view=msvc-170 I think the solution is for dll's delaylib trampoline to save xmm1 on the stack before calling __delayLoadHelper2. I made a patch which does this, and it fixes the bug for my code. See attached patch. I think my patch has two problems: 1. AVX/vmovupd/ymm might not be usable on the target machine, but saving just xmm isn't enough. Should we perform a CPUID check? 2. We store unaligned with vmovupd. Storing aligned with vmovapd would be better. I haven't looked into how to align ymm registers when storing on the stack. I'd love to get this bug fixed so others don't spend two days debugging assembly code! Matthew "strager" Glazar
dlltool.patch
Description: Binary data