https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557

            Bug ID: 70557
           Summary: uint64_t zeroing on 32-bit hardware
           Product: gcc
           Version: 5.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acahalan at gmail dot com
  Target Milestone: ---

Created attachment 38196
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38196&action=edit
C source, gcc 5.3.0 assembly output, IDA Pro disassembly

This is C with gcc 5.3.0 targeting the MCF5272 coldfire (m68k w/o alignment
constraint).

To clear 8 bytes of memory, gcc should always issue a pair of clr.L
instructions. 

This applies both when the address is known to the linker (the address should
be contained in an instruction that loads an address register) and when the
address is supplied as a function argument (the address should be loaded into a
register which the clr.L will then use).

Because the hardware is 32-bit, a 64-bit value should be handled the same as a
pair of 32-bit values. Because there is no alignment requirement, 8 adjacent
8-bit values (total of 64 bits) should likewise be handled the same.

All 6 cases (3 access sizes times 2 ways to address the data) are shown in the
provided attachment. Only one of the 6 cases seems optimal, the one named
"clear32p" which takes a pointer to a pair of 32-bit values as a function
argument. The case named "clear32", referring to global data, isn't bad... but
really the address should be loaded into an address register to save 2 bytes.

Though not the worst for performance (that honor going to the 8-bit functions),
the 64-bit functions are particularly painful to look at. With these, gcc
clears out two different registers and then moves both of them into memory. An
obvious optimization would be to clear only a single register and use it twice.
Another obvious optimization would be to directly clear the memory via a clr.L
that uses memory addressing, either absolute or register-based as appropriate,
though loading an address register is even better.

In any case, the 6 functions in this example should compile to 2 distinct kinds
of result. The access size should not change the resulting assembly.

Reply via email to