[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 Andrew Pinski changed: What|Removed |Added See Also||https://github.com/llvm/llv ||m-project/issues/17135, ||https://github.com/llvm/llv ||m-project/issues/12135 --- Comment #65 from Andrew Pinski --- (In reply to Patrick J. LoPresti from comment #64) > This bug should be trivial to fix by checking for self-assignment before > calling memcpy(). Doesn't GCC inline the assignments for small objects > anyway?
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #64 from Patrick J. LoPresti --- The C (and POSIX) standards have had "restrict" on the arguments to memcpy() since C99. So calling it with overlapping arguments is undefined behavior and always has been. This bug should be trivial to fix by checking for self-assignment before calling memcpy(). Doesn't GCC inline the assignments for small objects anyway? I still have no idea why this is even controversial.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 Andrew Pinski changed: What|Removed |Added CC||mikulas at artax dot karlin.mff.cu ||ni.cz --- Comment #63 from Andrew Pinski --- *** Bug 115541 has been marked as a duplicate of this bug. ***
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #62 from Rich Felker --- The process described there would have to end at least N bits before the end of the destination buffer. The point was that it would destroy information internal to the buffer at each step along the way, before it got to the end.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #61 from Richard Earnshaw --- Then I don't understand what you're trying to say in c57.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #60 from Rich Felker --- Nobody said anything about writing past end of buffer. Obviously you can't do that.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #59 from Richard Earnshaw --- Memcpy must never write beyond the end of the specified buffer, even if reading it is safe. That wouldn't be thread safe.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #58 from Jakub Jelinek --- (In reply to Rich Felker from comment #57) > and more concerned about the consequences of LTO/whole-program-analysis where > something in the translation process can see the violated restrict > qualifier, infer UB, and blow everything up. That can't happen, in GCC in GIMPLE these are represented as assignments, not {__builtin_,}memcpy calls and are turned into the calls (or inline expansion of the copying) only when being expanded into RTL. All the LTO/whole program optimizations happen on GIMPLE, so at that point nothing can be inferred from that because it simply isn't present in the IL and only after all LTO & IPA optimizations are done individual functions go through the rest of GIMPLE optimizations and then RTL ones. The only exception to that is IPA-RA, which intra partition (for LTO, otherwise within the TU) can take into account what hard registers are used/unused by previously emitted functions and take that knowledge into their callers emitted later; but for those this is a library call like any other.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #57 from Rich Felker --- I think one could reasonably envision an implementation that does some sort of vector loads/stores where, due to some performance constraint or avoiding special casing for possible page boundary past the end of the copy, it only wants to load N bits at a time, but the efficient store instruction always stores a full vector of 2N bits. Of course, one could also argue quite reasonably that this is a weird enough thing to do that the implementation should then just check for src==dest and early-out. I'm far less concerned about whether such mechanical breakage exists, and more concerned about the consequences of LTO/whole-program-analysis where something in the translation process can see the violated restrict qualifier, infer UB, and blow everything up. The change being requested here is really one of removing the restrict qualification from the arguments and making a custom weaker condition. This may in turn have consequences on what types of transformations are possible.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #56 from Richard Earnshaw --- I've never heard of a memcpy implementation that corrupts data if called with memcpy (p, p, n). (The problems come from partial overlaps where the direction of the copy may matter). Has anybody considered asking the standards committee to bless this as a special exception? Of course, if n is large, then performing an early test is still worthwhile, but for small n, the cost of the check possibly exceeds the benefit of eliding the copy.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #55 from Florian Weimer --- (In reply to post+gcc from comment #52) > For the point discussed earlier with the `restrict` in the musl memcpy, I > had another look at the definition of `restrict` and it's not entirely clear > to me any more that there is UB here. The restrict rules only apply to > objects that are "also modified (by any means)". Now the question is, does > "*X = *X" modify the object? C11 says this: | NOTE: “Modify” includes the case where the new value being stored is the same as the previous value. So at least this should be quite clear (although I think notes are supposedly informal).
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #54 from Richard Biener --- (In reply to post+gcc from comment #52) > For the point discussed earlier with the `restrict` in the musl memcpy, I > had another look at the definition of `restrict` and it's not entirely clear > to me any more that there is UB here. The restrict rules only apply to > objects that are "also modified (by any means)". Now the question is, does > "*X = *X" modify the object? Is a write always a modification or only if the > stores representation changes or only if the stored value changes? > > If it requires a representation change, then "memcpy(x, x, n)" does not > modify anything, and hence there is no UB from "restrict". Heh, interesting splitting of hairs ;) I guess you need to file a DR with the language to resolve this question ...
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #53 from rguenther at suse dot de --- On Tue, 28 Nov 2023, post+gcc at ralfj dot de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 > > --- Comment #51 from post+gcc at ralfj dot de --- > Oh great, I love it when one part of the C standard just adds exceptions to > statements made elsewhere. It's almost as if the authors want this to be as > hard to understand as possible... > > That then raises the question which version of the signature is actually used > for building (and optimizing) the function: the one in the declaration or the > one in the definition. Does the standard have an answer to that? For avoidance of doubt the frontends should drop non-semantic qualifiers from declarations then just in case the middle-end tries to apply semantics there. Like it does for const qualified reference arguments (OK, that's not C but C++). The middle-end also uses the qualifiers for diagnostic purposes.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #52 from post+gcc at ralfj dot de --- For the point discussed earlier with the `restrict` in the musl memcpy, I had another look at the definition of `restrict` and it's not entirely clear to me any more that there is UB here. The restrict rules only apply to objects that are "also modified (by any means)". Now the question is, does "*X = *X" modify the object? Is a write always a modification or only if the stores representation changes or only if the stored value changes? If it requires a representation change, then "memcpy(x, x, n)" does not modify anything, and hence there is no UB from "restrict".
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #51 from post+gcc at ralfj dot de --- Oh great, I love it when one part of the C standard just adds exceptions to statements made elsewhere. It's almost as if the authors want this to be as hard to understand as possible... That then raises the question which version of the signature is actually used for building (and optimizing) the function: the one in the declaration or the one in the definition. Does the standard have an answer to that?
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #50 from joseph at codesourcery dot com --- Qualifiers on function parameter types do not affect type compatibility or composite type (see 6.7.6.3#14). I think they're only actually of significance in the definition; in a declaration they effectively serve as documentation.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #49 from post+gcc at ralfj dot de --- Even glibc itself seems to use `restrict`: https://codebrowser.dev/glibc/glibc/string/string.h.html#43 So the compiler building glibc might inadvertently rely on the memory written through dst and the memory read through src being disjoint, making even the perfectly-overlapping case UB (unless the implementation has a guard somewhere that skips the copy when src==dst, but I was not able to find such a guard). (The implementation at https://codebrowser.dev/glibc/glibc/string/memcpy.c.html does not have the `restrict`, but it includes the string.h header and I think the compiler is allowed to apply attributes from the declaration to the definition. Or, alternatively, it might even be UB to have `restrict` in one place and not the other: "All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined" [C23 §6.2.7.2] and "For two qualified types to be compatible, both shall have the identically qualified version of a compatible type; the order of type qualifiers within a list of specifiers or qualifiers does not affect the specified type" [C23 §6.7.3.11].)
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #48 from post+gcc at ralfj dot de --- > Note, clang makes the same assumption apparently (while MSVC emits rep movs > inline and ICC either that, or calls _intel_fast_memcpy). MSVC does the same thing as clang and GCC, if godbolt is to be trusted: https://rust.godbolt.org/z/o7TevfvcY
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #47 from Jakub Jelinek --- (In reply to Richard Biener from comment #46) > So yes, the most clean solution would be to have __forgiving_memcpy > possibly also allowing NULL pointers when n == 0 besides allowing > the exact overlap. Its prototype wouldn't have restrict then or > nonnull. That is basically what I've argued above (however it is called), and completely agree also on the NULL pointers when n == 0 case for it, I think that one came up in the Honza vs. Jonathan libstdc++ discussions again recently. And, ideally it could be implemented in libc as an alias to memcpy if e.g. the assembly written memcpy satisfies all the requirements in it, but valgrind could implement it separately and do there if (n == 0 || dst == src) return dst; return memcpy (dst, src, n); Of course, it would take some time, because it needs to be in libc first, gcc needs to key on the versions which have it. But then possibly in such case could also fold if (n != 0) memcpy (dst, src, n); into __forgiving_memcpy (dst, src, n); or __builtin_* variants thereof.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #46 from Richard Biener --- I'll note that restrict on the parameters can help optimizing the copying loop because you can then unroll and hoist loads before stores. I'll also note that almost all (important) targets implement memcpy with special assembler in libc. But yes, a compiler might, seeing void foo (char * __restrict d, char * s) { *d = *s; if (d != s) abort (); } optimize the compare to always true because of the wording of the C standard with regard to restrict. GCC doesn't (yet) do this. A compiler might also transform foo (p, p); to trap (as part of eliminating paths with undefined behavior) if it can analyze foo inter-procedurally seeing that foo will always invoke undefined behavior if d == s. For memcpy the same would hold true but the compiler would need to know that the size of the copy is not zero. So yes, the most clean solution would be to have __forgiving_memcpy possibly also allowing NULL pointers when n == 0 besides allowing the exact overlap. Its prototype wouldn't have restrict then or nonnull. This all doesn't change the fact that GCC up to now always assumed memcpy with exact overlap is fine to do, so it's practically impossible to change implementations without recompiling the world with a "fixed" compiler.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #45 from Tim Ruffing --- Ha, I didn't want to move to discussion about restrict, but as a final remark: The formal semantics of restrict are not easy to parse and understand (see https://www.iso-9899.info/n3047.html#6.7.3.1 ). But just the sentence "If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply ..." should make clear that no requirement is made at all if no object is modified. And this is clearly the case for `if (src==dest) return;` Besides that, I agree with what Ralf said: > I'm not a compiler dev nor a libc dev, I just want to make sure that my > compiler and my libc use the same contract when talking to each other -- but > I hope someone who is a compiler dev or a libc dev can go and actually test > these hypotheses, rather than just speculating about it as has been happening > so far.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #44 from Rich Felker --- My naive expectation is that "if ((uintptr_t)src == 0x400400)" is and should be UB, but I may be misremembering the details of the formalism by which the spec for restrict is implemented. If so, that's kinda a help, but I still think you would want to remove restrict from the arguments and apply it later, so that the fast-path head/tail copies can avoid any branch, and the check for equality can be deferred until it's known that there's a "body remainder" to copy. That's the part where you really want the benefits of restrict anyway -- without restrict it's not vectorizable because the compiler has to assume there might be nonexact overlap, in which case reordering the loads and stores in any way could change the result.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #43 from post+gcc at ralfj dot de --- That is not my reading of the standard, but absent a proper (formal, mathematical) spec I guess it is hard to tell. With your reading, "if ((uintptr_t)src == 0x400400)" is UB, since changing the "src" argument to a different copy located at that address would change the execution. I strongly doubt that is the intent of the standard.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #42 from Rich Felker --- > I'm not saying that such an implementation will be a good idea, but just a > remark: You could, in fact, keep restrict for the arguments in this case, > because the object pointed to by src and dest is not accessed at all when > src==dest. So this is correct code according to the standard. (The exact > semantics of restrict are a bit involved...) Nope, UB is invoked as soon as you evaluate src==dest, even with no dereferencing. The semantics of restrict are such that the behavior of the code must be unchanged if the pointer were replaced to a pointer to a relocated copy of the pointed-to object. Since this would alter the result of the == operator, that constraint is not satisfied and thereby the behavior is undefined.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #41 from post+gcc at ralfj dot de --- > This entitles a compiler to emit asm equivalent to if (src==dest) system("rm > -rf /") if it likes. No it does not. restrict causes UB if the two pointers are used to access the same memory. It has nothing to do with whether the pointers are equal. So it would have to be "if (src==dest && n>0)" and the compiler would have to first prove that "n>0" implies that later accesses through both pointers occur at offset 0 (and at least one of them is a write). But it's still UB to call this the way GCC does, that much I agree with. > Our memcpy is not written in asm but in C, and it has the restrict qualifier > on src and dest. The question is, does that qualifier help? If you remove it, does the generated assembly change in any way, does the performance change? If not, it clearly doesn't matter and can just be removed. If yes, then yeah compilers clearly shouldn't call this with identical ranges. Basically, compiler devs are claiming that libc can support the src==dest case "for free", without any noticeable cost to other uses of the function. libc devs are claiming that compilers can insert a branch that tests for equality, without any noticeable cost. Both of these are testable hypotheses. I'm not a compiler dev nor a libc dev, I just want to make sure that my compiler and my libc use the same contract when talking to each other -- but I hope someone who is a compiler dev or a libc dev can go and actually test these hypotheses, rather than just speculating about it as has been happening so far.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #40 from Sam James --- https://git.musl-libc.org/cgit/musl/tree/src/string/memcpy.c#n5
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #39 from Tim Ruffing --- (In reply to Rich Felker from comment #36) > Our memcpy is not written in asm but in C, and it has the restrict qualifier > on src and dest. For my curiosity, can you point me to this implementation? I couldn't find a memcpy with restrict in the libc source, but I suspect that's just because I'm not familiar with the code base.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #38 from Tim Ruffing --- (In reply to Rich Felker from comment #36) > If you're happy with a branch, you could probably take restrict off the > arguments and do something like: > > if (src==dest) return; > const char *restrict src2 = src; > char *restrict dest2 = dest; > ... > I'm not saying that such an implementation will be a good idea, but just a remark: You could, in fact, keep restrict for the arguments in this case, because the object pointed to by src and dest is not accessed at all when src==dest. So this is correct code according to the standard. (The exact semantics of restrict are a bit involved...)
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #37 from Rich Felker --- Also: has anyone even looked at what happens if a C memcpy with proper use of restrict gets LTO-inlined into a caller with GCC-generated memcpy calll where src==dest? That sounds like a case likely to blow up...
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #36 from Rich Felker --- > the assembly generated by the current implementations already supports that > case. Our memcpy is not written in asm but in C, and it has the restrict qualifier on src and dest. This entitles a compiler to emit asm equivalent to if (src==dest) system("rm -rf /") if it likes. I don't know how you can write a valid C implementation of memcpy that "doesn't care" about 100% overlap without giving up restrict (and the benefits it entails) entirely. If you're happy with a branch, you could probably take restrict off the arguments and do something like: if (src==dest) return; const char *restrict src2 = src; char *restrict dest2 = dest; ... but that's shoving the branch into memcpy where it's a cost on every caller making dynamic memcpys with potentially tiny size (like qsort, etc.) and obeying the contract not to call with overlapping src/dest, rather than just imposing it on bad callers.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #35 from Jakub Jelinek --- Note, clang makes the same assumption apparently (while MSVC emits rep movs inline and ICC either that, or calls _intel_fast_memcpy). As I mentioned earlier, if glibc and other libraries provide an alias to memcpy or memmove (whichever guarantees that exact overlap will be handled right in the implementation), GCC could start using that instead of memcpy for those cases. And it would be up to the libraries to decide which implementation satisfies that.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #34 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:7758cb4b53e8a33642709402ce582f769eb9fd18 commit r14-5772-g7758cb4b53e8a33642709402ce582f769eb9fd18 Author: Richard Biener Date: Thu Nov 23 08:54:56 2023 +0100 middle-end/32667 - document cpymem and memcpy exact overlap requirement The following amends the cpymem documentation to mention that exact overlap needs to be handled gracefully, also noting that the target runtime is expected to behave the same way where -ffreestanding docs mention the set of routines required. PR middle-end/32667 * doc/md.texi (cpymem): Document that exact overlap of source and destination needs to work. * doc/standards.texi (ffreestanding): Mention memcpy is required to handle the exact overlap case.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #33 from rguenther at suse dot de --- On Thu, 23 Nov 2023, fw at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 > > --- Comment #32 from Florian Weimer --- > There's this in standards.texi: > > Most of the compiler support routines used by GCC are present in > @file{libgcc}, but there are a few exceptions. GCC requires the > freestanding environment provide @code{memcpy}, @code{memmove}, > @code{memset} and @code{memcmp}. > Finally, if @code{__builtin_trap} is used, and the target does > not implement the @code{trap} pattern, then GCC emits a call > to @code{abort}. > > Maybe that would be a place to mention this issue, too? Will add.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #32 from Florian Weimer --- There's this in standards.texi: Most of the compiler support routines used by GCC are present in @file{libgcc}, but there are a few exceptions. GCC requires the freestanding environment provide @code{memcpy}, @code{memmove}, @code{memset} and @code{memcmp}. Finally, if @code{__builtin_trap} is used, and the target does not implement the @code{trap} pattern, then GCC emits a call to @code{abort}. Maybe that would be a place to mention this issue, too?
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #31 from Richard Biener --- (In reply to Patrick J. LoPresti from comment #29) > (In reply to Jakub Jelinek from comment #27) > > > > No, that is not a reasonable fix, because it severely pessimizes common code > > for a theoretical only problem. > > The very existence of (and interest in) this bug report means it is > obviously not "a theoretical only problem". > > And of course Rich Felker is correct that the cost of the obvious fix is > trivial and not remotely "severe". But I didn't see a patch proposed to address this issue, which means it doesn't seem to be trivial. > But the bottom line is that GCC is emitting library calls that invoke > undefined behavior. At a minimum, GCC should document this non-standard > requirement on its runtime environment. Has anyone bothered to do that? Why > not? I think it's written down somewhere but I can't quickly find it (I also wonder where exactly the best place to document would be - it's related to porting GCC to a new target architecture I guess, not so much user-facing). OTOH I see @cindex @code{cpymem@var{m}} instruction pattern @item @samp{cpymem@var{m}} ... The @code{cpymem@var{m}} patterns need not give special consideration to the possibility that the source and destination strings might overlap. These patterns are used to do inline expansion of @code{__builtin_memcpy}. which is possibly the closest piece we have and which fails to mention exact overlap. I'll propose an adjustment to this.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #30 from post+gcc at ralfj dot de --- There have been several assertions above that a certain way to solve this either has no performance cost at all or severe performance cost. That sounds like we are missing data -- ideally, someone would benchmark the actual cost of emitting that branch. It seems kind of pointless to just make assertions about the impact of this change without real data. > On the other hand, expecting the libc memcpy to make this check greatly > pessimizes every reasonable small use of memcpy with a gratuitous branch for > what is undefined behavior and should never appear in any valid program. I don't think this is true. As far as I can see, the performance impact of having memcpy support the src==dest case is zero -- the assembly generated by the current implementations already supports that case. (At least I have not seen any evidence to the contrary.) No new check in memcpy is required.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #29 from Patrick J. LoPresti --- (In reply to Jakub Jelinek from comment #27) > > No, that is not a reasonable fix, because it severely pessimizes common code > for a theoretical only problem. The very existence of (and interest in) this bug report means it is obviously not "a theoretical only problem". And of course Rich Felker is correct that the cost of the obvious fix is trivial and not remotely "severe". But the bottom line is that GCC is emitting library calls that invoke undefined behavior. At a minimum, GCC should document this non-standard requirement on its runtime environment. Has anyone bothered to do that? Why not?
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #28 from Rich Felker --- > No, that is not a reasonable fix, because it severely pessimizes common code > for a theoretical only problem. Far less than a call to memmove (which necessarily has something comparable to that and other unnecessary branches) pessimizes it. I also disagree that it's severe. On basically any machine with branch prediction, the branch will be predicted correctly all the time and has basically zero cost. On the other hand, the branches in memmove could go different ways depending on the caller, so it's much more machine-capability-dependent whether they can be predicted. In some sense the optimal thing to do is "nothing", just assuming it would be hard to write a memcpy that fails on src==dest. However, at the very least this precludes hardened memcpy trapping on src==dest, which might be a useful hardening feature (or rather on a range test for overlapping, which would happen to also catch exact overlap). So it would be nice if it were fixed. FWIW, I don't think single branches are relevant to overall performance in cases where the compiler is doing something reasonable by emitting a call to memcpy to implement assignment. If the object is small enough that the branch is relevant, the call overhead is even more of a big deal, and it should be inlining loads/stores to perform the assignment.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #27 from Jakub Jelinek --- (In reply to Rich Felker from comment #26) > > The only reasonable fix on the compiler side is to never emit memcpy but > > always use memmove. > > No, it can literally just emit (equivalent at whatever intermediate form of): > > cmp src,dst > je 1f > call memcpy > 1: > > in place of memcpy. No, that is not a reasonable fix, because it severely pessimizes common code for a theoretical only problem.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #26 from Rich Felker --- > The only reasonable fix on the compiler side is to never emit memcpy but > always use memmove. No, it can literally just emit (equivalent at whatever intermediate form of): cmp src,dst je 1f call memcpy 1: in place of memcpy. It can even optimize out that in the case where it's provable that they're not equal, e.g. presence of restrict or one of the two objects not having had its address taken/leaked.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #25 from rguenther at suse dot de --- On Tue, 21 Nov 2023, bugdal at aerifal dot cx wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 > > Rich Felker changed: > >What|Removed |Added > > CC||bugdal at aerifal dot cx > > --- Comment #24 from Rich Felker --- > If the copy is such that gcc is happy to emit an external call to memcpy for > it, there is no significant size or performance cost to emitting a branch > checking for equality before making the call, and performing this branch would > greatly optimize the (maybe rare in the caller, maybe not) case of > self-assignment! > > On the other hand, expecting the libc memcpy to make this check greatly > pessimizes every reasonable small use of memcpy with a gratuitous branch for > what is undefined behavior and should never appear in any valid program. > > Fix it on the compiler side please. The only reasonable fix on the compiler side is to never emit memcpy but always use memmove. The pessimization of that compared to the non-existing "real" issue with calling memcpy with exact overlap is the reason for the non-action.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #24 from Rich Felker --- If the copy is such that gcc is happy to emit an external call to memcpy for it, there is no significant size or performance cost to emitting a branch checking for equality before making the call, and performing this branch would greatly optimize the (maybe rare in the caller, maybe not) case of self-assignment! On the other hand, expecting the libc memcpy to make this check greatly pessimizes every reasonable small use of memcpy with a gratuitous branch for what is undefined behavior and should never appear in any valid program. Fix it on the compiler side please.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 post+gcc at ralfj dot de changed: What|Removed |Added CC||post+gcc at ralfj dot de --- Comment #23 from post+gcc at ralfj dot de --- > Is glibc community ready to provide such guarantee? This is indeed a key question here I think. Currently GCC makes assumptions that *even the libc produced by the same project* does not document as stable guarantees. That's rather dissonant. The GNU project should at least within itself come to a proper conclusion on the question of whether memcpy should be UB or not when both pointers are equal. Right now we have everyone pointing at everyone else, and users are left in the rain with their valgrind errors. Ideally of course the C standard would be updated to ensure that slowly but steadily, the memcpy contract is updated to match reality. That will take a while, but given that this issue was filed 16 years ago (!), there clearly would have been enough time. (If someone does, please join forces with the clang people that are interested in getting C updated: https://reviews.llvm.org/D86993#4585590). But GNU controls glibc so there's not really any excuse for not updating those docs, I think? glibc making such a move would be a great step towards convincing valgrind and the C committee that memcpy should have defined behavior when both pointers are equal.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #22 from Patrick J. LoPresti --- I disagree that bug 108296 is a duplicate. That bug requires code that, at least arguably, invokes undefined behavior. See e.g. https://stackoverflow.com/q/7292862/ and https://stackoverflow.com/q/61070828/. This bug is about clearly valid C++ code (object self-assignment) for which GCC emits clearly invalid calls to memcpy() (with dest == src). Now, I suspect what Andrew is thinking is that both of these bugs could be resolved by invoking memmove() instead of memcpy(). That seems like a reasonable idea to me, since small assignments get inlined and large assignments can amortize the overhead. But this bug could also be resolved in other ways, a few of which have been suggested in these comments.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #21 from Nadav Har'El --- This old problem has become a real problem in gcc 12 with a real effect on incorrect code generation, where code that copies an object was incorrectly "optimized" to use __builtin_memcpy() instead of __builtin_memmove() even though the source and destination objects may overlap - and it turns out that __builtin_memcpy() may produce incorrect results for overlapping addresses. This bug was discovered in the OSv project https://github.com/cloudius-systems/osv/issues/1212 with code that doesn't (obviously) call __builtin_memcpy() directly, but rather had a 27-character type being copied and the compiler implemented this copy with a call to __builtin_memcpy(). Here is an example of code which generates the wrong results (note the missing "c" in the result and unexpectedly doubled "g"): #include int main(){ char buf[128] = "0123456789abcdefghijklmnopqrstuvwxyz"; struct [[gnu::packed]] data { char x[27]; }; void *p0 = buf; void *p1 = [1]; *static_cast(p0) = *static_cast(p1); printf("%s", buf); }
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 Andrew Pinski changed: What|Removed |Added CC||nyh at math dot technion.ac.il --- Comment #20 from Andrew Pinski --- *** Bug 108296 has been marked as a duplicate of this bug. ***
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #19 from rguenther at suse dot de --- On Wed, 9 Jun 2021, public at timruffing dot de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 > > Tim Ruffing changed: > >What|Removed |Added > > CC||public at timruffing dot de > > --- Comment #18 from Tim Ruffing --- > Is there still interest in fixing this? This still leads to spurious Valgrind > warnings in the real world. I don't think this is ever going to be "fixed" on the GCC side, instead maybe valgrind should be at least given an option to ignore those cases.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 Tim Ruffing changed: What|Removed |Added CC||public at timruffing dot de --- Comment #18 from Tim Ruffing --- Is there still interest in fixing this? This still leads to spurious Valgrind warnings in the real world.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #17 from Alexander Cherepanov --- On 2016-04-25 17:34, joseph at codesourcery dot com wrote: > Or it could simply document an additional requirement on the C library > used with GCC (that the case of exact overlap works), just as it documents > the requirement that some functions such as memcpy be provided even in the > freestanding case. The compiler and library implementations need to > cooperate to produce a conforming C implementation. 1. Is glibc community ready to provide such guarantee? The doc[1] doesn't currently mention an exception for exact overlap. [1] https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memcpy 2. I agree that it should be documented. Together with the assumption that memcpy doesn't clobber errno. 3. The decision affects the whole Free Software community. If memcpy with exact overlap inserted be gcc are legitimized then users cannot use tools like valgrind to catch such cases in their own code. (And neither _FORTIFY_SOURCE nor UBSan seem to catch overlapping memcpy now.) So it effectively leads to legitimizing such memcpy in user's code too. Making memcpy closer to a simple assignment (C11, 6.5.16.1p3) could be a good thing. Maybe even fix C std or POSIX? BTW llvm bug is here: https://llvm.org/bugs/show_bug.cgi?id=11763 . It also contains links to valgrind discussions.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #16 from Patrick J. LoPresti --- Well, Valgrind itself is another real-world example. Tools like Valgrind cannot distinguish invalid memcpy() calls by the programmer from invalid memcpy() calls emitted by GCC. You can, of course, redefine every standard interface you like and simply document it. That would be awful engineering -- as in this case -- but you could do it.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #15 from joseph at codesourcery dot com --- On Sun, 24 Apr 2016, lopresti at gmail dot com wrote: > That said, this is clearly a real bug in GCC. memcpy has a well-defined > interface; GCC emits calls violating that interface; therefore GCC is buggy. I > do not see why this is even controversial. (Also see Julian Seward's > description of real-world problems in comment 5.) As far as I know, that does not describe real-world problems on any target supported by GCC. > GCC either needs to provide its own mempcy or it needs to respect the > interface. The cost of the latter is surely trivial, especially since many > cases could be optimized by alias analysis. Or it could simply document an additional requirement on the C library used with GCC (that the case of exact overlap works), just as it documents the requirement that some functions such as memcpy be provided even in the freestanding case. The compiler and library implementations need to cooperate to produce a conforming C implementation.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #14 from Patrick J. LoPresti --- D Hugh Redelmeier in comment 12 is mistaken... memcpy is a reserved identifier (see e.g. http://stackoverflow.com/a/23841970), so the user cannot legally redefine it. That said, this is clearly a real bug in GCC. memcpy has a well-defined interface; GCC emits calls violating that interface; therefore GCC is buggy. I do not see why this is even controversial. (Also see Julian Seward's description of real-world problems in comment 5.) GCC either needs to provide its own mempcy or it needs to respect the interface. The cost of the latter is surely trivial, especially since many cases could be optimized by alias analysis.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 Alexander Cherepanov changed: What|Removed |Added CC||ch3root at openwall dot com --- Comment #13 from Alexander Cherepanov --- This bug could be reproduced with gcc 7.0.0 20160424 on x86-64 with this example: int main() { volatile struct { char s[8193]; // gcc //char s[129]; // clang } s = {""}; s = s; }
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 D. Hugh Redelmeier changed: What|Removed |Added CC||hugh at mimosa dot com --- Comment #12 from D. Hugh Redelmeier --- JJ in #11 is right. The compiler should never generate calls to functions with names in the user's space. That's because the user is allowed to (re)define anything with those names and is not constrained to preserve expected semantics. There should be a function in the reserved-to-system namespace that does what is needed. GCC should then call this function. Let's say that GCC uses __memcpy. The default standard definition of memcpy could be a synonym (until redefinition) for __memcpy. On the other hand, there are compile-time constraints on struct assignments that could yield even better performance. For example, GCC might know that the object is a multiple of 8 bytes, aligned on an 8-byte boundary. GCC would certainly know that the struct's length is non-zero, conceptually eliminating one test. This suggests implementing specialized variants, perhaps having names starting with __struct_assign. (I know GCC has an extension allowing 0 length objects. The assignment for such objects could be eliminated.)
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org --- And another way to fix this is add a new library entrypoint, that would have the same semantics as memcpy, but would allow full overlap (dst == src). When we e.g. expand this without a library call, we could do exactly what we do for memcpy, because we know we do handle the dst == src case fine. Similarly, e.g. in glibc it could very well be just another alias to memcpy, because we know it handles that too. On targets which would not have the library function we'd use the #c10 approach. Of course, this would require coordination in between glibc, gcc, valgrind, libsanitizer, memstomp etc.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #10 from Richard Biener rguenth at gcc dot gnu.org --- One way to fix this is to emit the memcpy as if (p != q) memcpy (p, q, ...); but of course that comes at a cost in code-size and runtime for no obvious good reason (in practice).
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added CC||msebor at gmail dot com --- Comment #9 from Andrew Pinski pinskia at gcc dot gnu.org --- *** Bug 65029 has been marked as a duplicate of this bug. ***
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Summary|builtin operator= generates |block copy with exact |memcpy with overlapping |overlap is expanded as |memory regions |memcpy Known to fail||4.7.0 --- Comment #8 from Richard Guenther rguenth at gcc dot gnu.org 2011-12-06 08:44:46 UTC --- This is a generic middle-end issue (at best). There is at least one duplicate of this bug (can't find it with a quick search).