https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |thopre01 at gcc dot gnu.org --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Arnd Bergmann from comment #4) > I've tried out a few more things as well, to see if the alignment of the > struct lpfc_name type or the builtin memcpy makes a different. Replacing the > array of eight bytes with a single uint64_t and scalar operations instead of > string functions makes very little difference, so it seems to be neither of > them. > > However, I think the wwn_to_uint64_t() function is what causes the problem. > This is supposed to be turned into a direct load or a byte reversing load > depending on endianess, but this apparently does not happen. I'm not aware of such transform - we have the 'bswap' pass on GIMPLE but that only looks for the direct load case (-mbig-endian): 64 bit load in target endianness found at: _95 = MEM[(uint8_t *)vport_wwpn_14(D)]; 64 bit load in target endianness found at: _125 = MEM[(uint8_t *)target_wwpn_16(D)]; 64 bit load in target endianness found at: _167 = MEM[(uint8_t *)vport_wwpn_14(D)]; 64 bit load in target endianness found at: _197 = MEM[(uint8_t *)target_wwpn_16(D)]; 64 bit load in target endianness found at: _236 = MEM[(uint8_t *)vport_wwpn_14(D)]; 64 bit load in target endianness found at: _266 = MEM[(uint8_t *)target_wwpn_16(D)]; as there is no "standard" target independent way to express a byte reversed load (optab or so). The closest we'd have is to "vectorize" this as tem = load v8qi; tem = vec_perm <tem, tem, { 7, 6, 5, 4, 3, 2, 1, 0 }>; ... = VIEW_CONVERT <uint64_t, tem>; if both the vector mode exists and the constant permute is handled by the target. The target would then need to combine the load and the permute into a reversing load (if the target indeed has such instruction). > Adding -mbig-endian to the compiler flags brings the stack usage down, so > presumably the optimization step that identifies byteswap patters is what > causes the stack growth.