On Wed, Mar 24, 2021 at 09:41:03AM +0100, Willy Tarreau wrote: > This is particularly strange. Could you please disassemble hlua_alloc ? > (dis hlua_alloc) ? > > You should find something like this: > > 0x00000000004476c3 <+147>: add DWORD PTR fs:0xfffffffffffdd678,0x1 > 0x00000000004476cc <+156>: lock add QWORD PTR [rip+0x25df8b],0x1 > # 0x6a5660 <_.44806> > 0x00000000004476d5 <+165>: lock add QWORD PTR [rip+0x25df8b],rbx > # 0x6a5668 <_.44806+8> > 0x00000000004476dd <+173>: mov rdi,rbx > 0x00000000004476e0 <+176>: call 0x4248b0 <malloc@plt> > 0x00000000004476e5 <+181>: sub DWORD PTR fs:0xfffffffffffdd678,0x1 > 0x00000000004476ee <+190>: test rax,rax > 0x00000000004476f1 <+193>: jne 0x44769d <hlua_alloc+109> > > The "add DWORD..." is the ++, and the "sub DWORD..." is the --, both of > them around the function call. > > If the variable had been declared static I would have agreed with the > possibility of reordering. But here it's global and it's not legal to > move it when you don't know if the called function will make use of it. > Otherwise global variables cannot be used anymore! > > By the way, in your case above it should even equal 2. That's rather > strange. > > One thing you can test is simply to declare it volatile. If it changes > anything, it will mean that gcc has some specific assumptions on this > malloc function (i.e. it decides it will never read your global variables) :-/
And this seems to be the case :-( I was wondering where the two lock add instructions were coming from and figure that I was building with DEBUG_MEM_STATS. Removing it resulted in the add/sub being remove, but only around malloc, not around realloc nor free. Malloc is declared with __attribute__((malloc)) on my system so I wondered if it could cause this. I tried adding this in the function: extern void *malloc (size_t __size); And it didn't change the output code. I just *renamed* the function to blablalloc() and now the code is correct again: 22cb: 64 83 04 25 00 00 00 addl $0x1,%fs:0x0 22d2: 00 01 22cf: R_X86_64_TPOFF32 hlua_not_dumpable 22d4: 48 89 df mov %rbx,%rdi 22d7: e8 00 00 00 00 callq 22dc <hlua_alloc+0x8c> 22d8: R_X86_64_PLT32 blablalloc-0x4 22dc: 64 83 2c 25 00 00 00 subl $0x1,%fs:0x0 22e3: 00 01 22e0: R_X86_64_TPOFF32 hlua_not_dumpable 22e5: 48 85 c0 test %rax,%rax 22e8: 75 be jne 22a8 <hlua_alloc+0x58> So yes, it looks like gcc decides that a function called "malloc" will never use your program's global variables but that "blablalloc" may. I have no explanation to this except "optimization craziness" resulting in breaking valid code, since it means that if I provide my own malloc function it might not work. Pfff.... At least the volatile solves it and is cleaner since it indicates it should assume it doesn't know it (e.g. for use from a signal handler which is what we're doing by the way): diff --git a/include/haproxy/hlua.h b/include/haproxy/hlua.h index 29656258e..46fbba6ec 100644 --- a/include/haproxy/hlua.h +++ b/include/haproxy/hlua.h @@ -50,7 +50,7 @@ void hlua_applet_tcp_fct(struct appctx *ctx); void hlua_applet_http_fct(struct appctx *ctx); struct task *hlua_process_task(struct task *task, void *context, unsigned short state); -extern THREAD_LOCAL unsigned int hlua_not_dumpable; +extern THREAD_LOCAL volatile unsigned int hlua_not_dumpable; #else /* USE_LUA */ /************************ For use when Lua is disabled ********************/ diff --git a/src/hlua.c b/src/hlua.c index 5bef67a48..6ccce72dd 100644 --- a/src/hlua.c +++ b/src/hlua.c @@ -237,7 +237,7 @@ struct hlua_mem_allocator { static struct hlua_mem_allocator hlua_global_allocator; /* > 0 if lua is in a non-rentrant part, thus with a non-dumpable stack */ -THREAD_LOCAL unsigned int hlua_not_dumpable = 0; +THREAD_LOCAL unsigned int volatile hlua_not_dumpable = 0; /* These functions converts types between HAProxy internal args or * sample and LUA types. Another function permits to check if the Willy