https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69874
Markus Trippelsdorf <trippels at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hjl.tools at gmail dot com --- Comment #5 from Markus Trippelsdorf <trippels at gcc dot gnu.org> --- Ah the misalignment will be fixed for 4.9.4, PR58066. H.J.: sysdeps/x86_64/dl-trampoline.S says: 23 #ifndef DL_STACK_ALIGNMENT 24 /* Due to GCC bug: 25 26 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 27 28 __tls_get_addr may be called with 8-byte stack alignment. Although 29 this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume 30 that stack will be always aligned at 16 bytes. We use unaligned 31 16-byte move to load and store SSE registers, which has no penalty 32 on modern processors if stack is 16-byte aligned. */ 33 # define DL_STACK_ALIGNMENT 8 34 #endif 35 36 #ifndef DL_RUNIME_UNALIGNED_VEC_SIZE 37 /* The maximum size of unaligned vector load and store. */ 38 # define DL_RUNIME_UNALIGNED_VEC_SIZE 16 39 #endif 40 41 /* True if _dl_runtime_resolve should align stack to VEC_SIZE bytes. */ 42 #define DL_RUNIME_RESOLVE_REALIGN_STACK \ 43 (VEC_SIZE > DL_STACK_ALIGNMENT \ 44 && VEC_SIZE > DL_RUNIME_UNALIGNED_VEC_SIZE) this apparently doesn't work for _dl_runtime_resolve_sse.