https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883
--- Comment #17 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Jeffrey A. Law from comment #16) > The issue here (of course) is that aarch64 has a different set of defaults > for when to open-code vs loop vs function call. My attempts to pick a > better size for the objects results in failures on other targets. > > Do we have a method on aarch64 to tune this stuff via flags? Otherwise I'm > likely to just xfail aarch64 and move on since DSE is doing what we want at > this point if given sane input. I don't know, this issue doesn't seem related to any backend setting - this is a typical inline memset expansion. Handling structures that are not a multiple of 4 or 8 are generally inefficient on GCC given the mid-end can't deal with overlapping accesses of different sizes. It's efficient if I change the size of the array to 8 rather than 7. So there is a real issue here, but maybe you'd prefer a new bugreport for that?