On Sat, Jan 14, 2023 at 2:55 AM Alexandre Oliva <ol...@adacore.com> wrote: > > Hello, Richard, > > Thank you for the feedback. > > On Jan 12, 2023, Richard Biener <richard.guent...@gmail.com> wrote: > > > On Tue, Dec 27, 2022 at 5:12 AM Alexandre Oliva via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > >> This patch extends the memset expansion to start with a loop, so as to > >> still take advantage of known alignment even with long lengths, but > >> without necessarily adding store blocks for every power of two. > > > I wonder if that isn't better handled by targets via the setmem pattern, > > That was indeed where I started, but then I found myself duplicating the > logic in try_store_by_multiple_pieces on a per-target basis. > > Target-specific code is great for tight optimizations, but the main > purpose of this feature is not an optimization. AFAICT it actually > slows things down in general (due to code growth, and to conservative > assumptions about alignment), except perhaps for some microbenchmarks. > It's rather a means to avoid depending on the C runtime, particularly > due to compiler-introduced memset calls.
OK, that's what I guessed but you didn't spell out. So does it make sense to mention -ffreestanding in the docs at least? My fear is that we'd get complaints that -O3 -finline-memset-loops turns nicely optimized memset loops into dumb ones (via loop distribution and then stupid re-expansion). So does it also make sense to turn off -floop-distribute-patterns[-memset] with -finline-memset-loops? > My initial goal was to be able to show that inline expansion would NOT > bring about performance improvements, but performance was not the > concern that led to the request. > > If the approach seems generally acceptable, I may even end up extending > it to other such builtins. I have a vague recollection that memcmp is > also an issue for us. The C/C++ runtime produce at least memmove, memcpy and memcmp as well. In this respect -finline-memset-loops is too specific and to avoid an explosion in the number of command line options we should try to come up with sth better? -finline-all-stringops[={memset,memcpy,...}] (just like x86 has -minline-all-stringops)? > > like x86 has the stringop inline strathegy. What is considered acceptable > > in terms of size or performance will vary and I don't think there's much > > room for improvements on this generic code support? > > *nod* x86 is quite finely tuned already; I suppose other targets may > have some room for additional tuning, both for performance and for code > size, but we don't have much affordance for avoiding builtin calls to > the C runtime, which is what this is about. > > Sometimes disabling loop distribution is enough to accomplish that, but > in some cases GNAT itself resorts to builtin memset calls, in ways that > are not so easy to avoid, and that would ultimately amount to expanding > memset inline, so I figured we might as well offer that as a general > feature, for users to whom this matters. > > Is (optionally) tending to this (uncommon, I suppose) need (or > preference?) not something GCC would like to do? Sure, I think for the specific intended purpose that would be fine. It should also only apply to __builtin_memset calls, not to memset calls from user code? Thanks, Richard. > -- > Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ > Free Software Activist GNU Toolchain Engineer > Disinformation flourishes because many people care deeply about injustice > but very few check the facts. Ask me about <https://stallmansupport.org>