https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #2 from petschy at gmail dot com ---
I agree that the generic case can become quite complicated: if after the
memset, the individual values are written with gaps between them, or multiple
contiguous chunks with gaps between them, it's not easy to tell whether having
a single memset + overwrites is better than having multiple memsets with
distinct regions + the individual byte writes, or anything in-between. It all
depends on the actual pattern.

However, for a simplified approach I can think of is keeping track of
contiguous regions that are written, then trimming the regions based on the
order and overlap, or merging them if they are adjacent.

In this particular case this would mean that
 [0,199] : memset 0 + [0,31] : const init
could be converted to
 [32,199] : memset 0 + [0,31] : const init
knowing that the const init comes later.

A similar adjustment can be made if a second const init region overlaps with
the end of the memset region. Adjacent or overlapping const init regions can be
merged.

But then, of course comes the devil with the details: if the trivial merging
and trimming of the intervals is done,
 - at what length is it worth having the memset merged into the const init
   regions, if a short memset is stuck between two const init regions?
 - and vica versa, at what length is it worth having a single memset with
   an overwriting const init region at the middle vs memset + const init +
   memset as disjunct regions?
 - at what point is it worth storing the whole data in .rodata and just memcpy
   it to the target?
 - how to integrate regions of runtime calculated values into the above?

For my particular case, I can work around this inefficiency by setting the
buffer to the exact size. I have no idea how a simple region based approach
like the above would perform in general and whether it would worth the
development effort.

Reply via email to