https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557
Bug ID: 70557 Summary: uint64_t zeroing on 32-bit hardware Product: gcc Version: 5.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: acahalan at gmail dot com Target Milestone: --- Created attachment 38196 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38196&action=edit C source, gcc 5.3.0 assembly output, IDA Pro disassembly This is C with gcc 5.3.0 targeting the MCF5272 coldfire (m68k w/o alignment constraint). To clear 8 bytes of memory, gcc should always issue a pair of clr.L instructions. This applies both when the address is known to the linker (the address should be contained in an instruction that loads an address register) and when the address is supplied as a function argument (the address should be loaded into a register which the clr.L will then use). Because the hardware is 32-bit, a 64-bit value should be handled the same as a pair of 32-bit values. Because there is no alignment requirement, 8 adjacent 8-bit values (total of 64 bits) should likewise be handled the same. All 6 cases (3 access sizes times 2 ways to address the data) are shown in the provided attachment. Only one of the 6 cases seems optimal, the one named "clear32p" which takes a pointer to a pair of 32-bit values as a function argument. The case named "clear32", referring to global data, isn't bad... but really the address should be loaded into an address register to save 2 bytes. Though not the worst for performance (that honor going to the 8-bit functions), the 64-bit functions are particularly painful to look at. With these, gcc clears out two different registers and then moves both of them into memory. An obvious optimization would be to clear only a single register and use it twice. Another obvious optimization would be to directly clear the memory via a clr.L that uses memory addressing, either absolute or register-based as appropriate, though loading an address register is even better. In any case, the 6 functions in this example should compile to 2 distinct kinds of result. The access size should not change the resulting assembly.