https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485
--- Comment #2 from petschy at gmail dot com --- I agree that the generic case can become quite complicated: if after the memset, the individual values are written with gaps between them, or multiple contiguous chunks with gaps between them, it's not easy to tell whether having a single memset + overwrites is better than having multiple memsets with distinct regions + the individual byte writes, or anything in-between. It all depends on the actual pattern. However, for a simplified approach I can think of is keeping track of contiguous regions that are written, then trimming the regions based on the order and overlap, or merging them if they are adjacent. In this particular case this would mean that [0,199] : memset 0 + [0,31] : const init could be converted to [32,199] : memset 0 + [0,31] : const init knowing that the const init comes later. A similar adjustment can be made if a second const init region overlaps with the end of the memset region. Adjacent or overlapping const init regions can be merged. But then, of course comes the devil with the details: if the trivial merging and trimming of the intervals is done, - at what length is it worth having the memset merged into the const init regions, if a short memset is stuck between two const init regions? - and vica versa, at what length is it worth having a single memset with an overwriting const init region at the middle vs memset + const init + memset as disjunct regions? - at what point is it worth storing the whole data in .rodata and just memcpy it to the target? - how to integrate regions of runtime calculated values into the above? For my particular case, I can work around this inefficiency by setting the buffer to the exact size. I have no idea how a simple region based approach like the above would perform in general and whether it would worth the development effort.