[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-06-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://github.com/llvm/llv
   ||m-project/issues/17135,
   ||https://github.com/llvm/llv
   ||m-project/issues/12135

--- Comment #65 from Andrew Pinski  ---
(In reply to Patrick J. LoPresti from comment #64)
> This bug should be trivial to fix by checking for self-assignment before
> calling memcpy(). Doesn't GCC inline the assignments for small objects
> anyway?

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-06-18 Thread lopresti at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #64 from Patrick J. LoPresti  ---
The C (and POSIX) standards have had "restrict" on the arguments to memcpy()
since C99. So calling it with overlapping arguments is undefined behavior and
always has been.

This bug should be trivial to fix by checking for self-assignment before
calling memcpy(). Doesn't GCC inline the assignments for small objects anyway?

I still have no idea why this is even controversial.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-06-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

Andrew Pinski  changed:

   What|Removed |Added

 CC||mikulas at artax dot 
karlin.mff.cu
   ||ni.cz

--- Comment #63 from Andrew Pinski  ---
*** Bug 115541 has been marked as a duplicate of this bug. ***

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #62 from Rich Felker  ---
The process described there would have to end at least N bits before the end of
the destination buffer. The point was that it would destroy information
internal to the buffer at each step along the way, before it got to the end.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #61 from Richard Earnshaw  ---
Then I don't understand what you're trying to say in c57.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #60 from Rich Felker  ---
Nobody said anything about writing past end of buffer. Obviously you can't do
that.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #59 from Richard Earnshaw  ---
Memcpy must never write beyond the end of the specified buffer, even if reading
it is safe.  That wouldn't be thread safe.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #58 from Jakub Jelinek  ---
(In reply to Rich Felker from comment #57)
> and more concerned about the consequences of LTO/whole-program-analysis where
> something in the translation process can see the violated restrict
> qualifier, infer UB, and blow everything up.

That can't happen, in GCC in GIMPLE these are represented as assignments, not
{__builtin_,}memcpy calls and are turned into the calls (or inline expansion of
the copying) only when being expanded into RTL.
All the LTO/whole program optimizations happen on GIMPLE, so at that point
nothing can be inferred from that because it simply isn't present in the IL and
only after all LTO & IPA optimizations are done individual functions go through
the rest of GIMPLE optimizations and then RTL ones.
The only exception to that is IPA-RA, which intra partition (for LTO, otherwise
within the TU) can take into account what hard registers are used/unused by
previously emitted
functions and take that knowledge into their callers emitted later; but for
those this is a library call like any other.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #57 from Rich Felker  ---
I think one could reasonably envision an implementation that does some sort of
vector loads/stores where, due to some performance constraint or avoiding
special casing for possible page boundary past the end of the copy, it only
wants to load N bits at a time, but the efficient store instruction always
stores a full vector of 2N bits. Of course, one could also argue quite
reasonably that this is a weird enough thing to do that the implementation
should then just check for src==dest and early-out.

I'm far less concerned about whether such mechanical breakage exists, and more
concerned about the consequences of LTO/whole-program-analysis where something
in the translation process can see the violated restrict qualifier, infer UB,
and blow everything up.

The change being requested here is really one of removing the restrict
qualification from the arguments and making a custom weaker condition. This may
in turn have consequences on what types of transformations are possible.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #56 from Richard Earnshaw  ---
I've never heard of a memcpy implementation that corrupts data if called with
memcpy (p, p, n).  (The problems come from partial overlaps where the direction
of the copy may matter).

Has anybody considered asking the standards committee to bless this as a
special exception?

Of course, if n is large, then performing an early test is still worthwhile,
but for small n, the cost of the check possibly exceeds the benefit of eliding
the copy.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-28 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #55 from Florian Weimer  ---
(In reply to post+gcc from comment #52)
> For the point discussed earlier with the `restrict` in the musl memcpy, I
> had another look at the definition of `restrict` and it's not entirely clear
> to me any more that there is UB here. The restrict rules only apply to
> objects that are "also modified (by any means)". Now the question is, does
> "*X = *X" modify the object?

C11 says this:

| NOTE: “Modify” includes the case where the new value being stored is the same
as the previous value.

So at least this should be quite clear (although I think notes are supposedly
informal).

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #54 from Richard Biener  ---
(In reply to post+gcc from comment #52)
> For the point discussed earlier with the `restrict` in the musl memcpy, I
> had another look at the definition of `restrict` and it's not entirely clear
> to me any more that there is UB here. The restrict rules only apply to
> objects that are "also modified (by any means)". Now the question is, does
> "*X = *X" modify the object? Is a write always a modification or only if the
> stores representation changes or only if the stored value changes?
> 
> If it requires a representation change, then "memcpy(x, x, n)" does not
> modify anything, and hence there is no UB from "restrict".

Heh, interesting splitting of hairs ;)  I guess you need to file a DR with the
language to resolve this question ...

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-28 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #53 from rguenther at suse dot de  ---
On Tue, 28 Nov 2023, post+gcc at ralfj dot de wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667
> 
> --- Comment #51 from post+gcc at ralfj dot de ---
> Oh great, I love it when one part of the C standard just adds exceptions to
> statements made elsewhere. It's almost as if the authors want this to be as
> hard to understand as possible...
> 
> That then raises the question which version of the signature is actually used
> for building (and optimizing) the function: the one in the declaration or the
> one in the definition. Does the standard have an answer to that?

For avoidance of doubt the frontends should drop non-semantic qualifiers
from declarations then just in case the middle-end tries to apply
semantics there.  Like it does for const qualified reference arguments
(OK, that's not C but C++).  The middle-end also uses the qualifiers
for diagnostic purposes.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-27 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #52 from post+gcc at ralfj dot de ---
For the point discussed earlier with the `restrict` in the musl memcpy, I had
another look at the definition of `restrict` and it's not entirely clear to me
any more that there is UB here. The restrict rules only apply to objects that
are "also modified (by any means)". Now the question is, does "*X = *X" modify
the object? Is a write always a modification or only if the stores
representation changes or only if the stored value changes?

If it requires a representation change, then "memcpy(x, x, n)" does not modify
anything, and hence there is no UB from "restrict".

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-27 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #51 from post+gcc at ralfj dot de ---
Oh great, I love it when one part of the C standard just adds exceptions to
statements made elsewhere. It's almost as if the authors want this to be as
hard to understand as possible...

That then raises the question which version of the signature is actually used
for building (and optimizing) the function: the one in the declaration or the
one in the definition. Does the standard have an answer to that?

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-27 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #50 from joseph at codesourcery dot com  ---
Qualifiers on function parameter types do not affect type compatibility or 
composite type (see 6.7.6.3#14).  I think they're only actually of 
significance in the definition; in a declaration they effectively serve as 
documentation.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-27 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #49 from post+gcc at ralfj dot de ---
Even glibc itself seems to use `restrict`:

https://codebrowser.dev/glibc/glibc/string/string.h.html#43

So the compiler building glibc might inadvertently rely on the memory written
through dst and the memory read through src being disjoint, making even the
perfectly-overlapping case UB (unless the implementation has a guard somewhere
that skips the copy when src==dst, but I was not able to find such a guard).

(The implementation at https://codebrowser.dev/glibc/glibc/string/memcpy.c.html
does not have the `restrict`, but it includes the string.h header and I think
the compiler is allowed to apply attributes from the declaration to the
definition. Or, alternatively, it might even be UB to have `restrict` in one
place and not the other: "All declarations that refer to the same object or
function shall have compatible type; otherwise, the behavior is undefined" [C23
§6.2.7.2] and "For two qualified types to be compatible, both shall have the
identically qualified version of a compatible type; the order of type
qualifiers within a list of specifiers or qualifiers does not affect the
specified type" [C23 §6.7.3.11].)

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-24 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #48 from post+gcc at ralfj dot de ---
> Note, clang makes the same assumption apparently (while MSVC emits rep movs 
> inline and ICC either that, or calls _intel_fast_memcpy).

MSVC does the same thing as clang and GCC, if godbolt is to be trusted:
https://rust.godbolt.org/z/o7TevfvcY

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #47 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #46)
> So yes, the most clean solution would be to have __forgiving_memcpy
> possibly also allowing NULL pointers when n == 0 besides allowing
> the exact overlap.  Its prototype wouldn't have restrict then or
> nonnull.

That is basically what I've argued above (however it is called), and completely
agree also on the NULL pointers when n == 0 case for it, I think that one came
up in the Honza vs. Jonathan libstdc++ discussions again recently.
And, ideally it could be implemented in libc as an alias to memcpy if e.g. the
assembly written memcpy satisfies all the requirements in it, but valgrind
could implement it separately and do there if (n == 0 || dst == src) return
dst; return memcpy (dst, src, n);
Of course, it would take some time, because it needs to be in libc first, gcc
needs to key on the versions which have it.  But then possibly in such case
could also fold
if (n != 0)
  memcpy (dst, src, n);
into
  __forgiving_memcpy (dst, src, n);
or __builtin_* variants thereof.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #46 from Richard Biener  ---
I'll note that restrict on the parameters can help optimizing the copying loop
because you can then unroll and hoist loads before stores.  I'll also note that
almost all (important) targets implement memcpy with special assembler in libc.

But yes, a compiler might, seeing

void foo (char * __restrict d, char * s)
{
  *d = *s;
  if (d != s)
abort ();
}

optimize the compare to always true because of the wording of the C standard
with regard to restrict.  GCC doesn't (yet) do this.  A compiler might also
transform

 foo (p, p);

to trap (as part of eliminating paths with undefined behavior) if it
can analyze foo inter-procedurally seeing that foo will always invoke
undefined behavior if d == s.

For memcpy the same would hold true but the compiler would need to know
that the size of the copy is not zero.

So yes, the most clean solution would be to have __forgiving_memcpy
possibly also allowing NULL pointers when n == 0 besides allowing
the exact overlap.  Its prototype wouldn't have restrict then or
nonnull.

This all doesn't change the fact that GCC up to now always assumed
memcpy with exact overlap is fine to do, so it's practically impossible
to change implementations without recompiling the world with a "fixed"
compiler.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-24 Thread public at timruffing dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #45 from Tim Ruffing  ---
Ha, I didn't want to move to discussion about restrict, but as a final remark:
The formal semantics of restrict are not easy to parse and understand (see
https://www.iso-9899.info/n3047.html#6.7.3.1 ). But just the sentence "If L is
used to access the value of the object X that it designates, and X is also
modified (by any means), then the following requirements apply ..." should make
clear that no requirement is made at all if no object is modified. And this is
clearly the case for `if (src==dest) return;`

Besides that, I agree with what Ralf said:
> I'm not a compiler dev nor a libc dev, I just want to make sure that my 
> compiler and my libc use the same contract when talking to each other -- but 
> I hope someone who is a compiler dev or a libc dev can go and actually test 
> these hypotheses, rather than just speculating about it as has been happening 
> so far.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #44 from Rich Felker  ---
My naive expectation is that "if ((uintptr_t)src == 0x400400)" is and should be
UB, but I may be misremembering the details of the formalism by which the spec
for restrict is implemented.

If so, that's kinda a help, but I still think you would want to remove restrict
from the arguments and apply it later, so that the fast-path head/tail copies
can avoid any branch, and the check for equality can be deferred until it's
known that there's a "body remainder" to copy. That's the part where you really
want the benefits of restrict anyway -- without restrict it's not vectorizable
because the compiler has to assume there might be nonexact overlap, in which
case reordering the loads and stores in any way could change the result.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #43 from post+gcc at ralfj dot de ---
That is not my reading of the standard, but absent a proper (formal,
mathematical) spec I guess it is hard to tell.

With your reading, "if ((uintptr_t)src == 0x400400)" is UB, since changing the
"src" argument to a different copy located at that address would change the
execution. I strongly doubt that is the intent of the standard.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #42 from Rich Felker  ---
> I'm not saying that such an implementation will be a good idea, but just a 
> remark: You could, in fact, keep restrict for the arguments in this case, 
> because the object pointed to by src and dest is not accessed at all when 
> src==dest. So this is correct code according to the standard. (The exact 
> semantics of restrict are a bit involved...)

Nope, UB is invoked as soon as you evaluate src==dest, even with no
dereferencing. The semantics of restrict are such that the behavior of the code
must be unchanged if the pointer were replaced to a pointer to a relocated copy
of the pointed-to object. Since this would alter the result of the == operator,
that constraint is not satisfied and thereby the behavior is undefined.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #41 from post+gcc at ralfj dot de ---
> This entitles a compiler to emit asm equivalent to if (src==dest) system("rm 
> -rf /") if it likes.

No it does not. restrict causes UB if the two pointers are used to access the
same memory. It has nothing to do with whether the pointers are equal. So it
would have to be "if (src==dest && n>0)" and the compiler would have to first
prove that "n>0" implies that later accesses through both pointers occur at
offset 0 (and at least one of them is a write).

But it's still UB to call this the way GCC does, that much I agree with.

> Our memcpy is not written in asm but in C, and it has the restrict qualifier 
> on src and dest.

The question is, does that qualifier help? If you remove it, does the generated
assembly change in any way, does the performance change? If not, it clearly
doesn't matter and can just be removed. If yes, then yeah compilers clearly
shouldn't call this with identical ranges.

Basically, compiler devs are claiming that libc can support the src==dest case
"for free", without any noticeable cost to other uses of the function. libc
devs are claiming that compilers can insert a branch that tests for equality,
without any noticeable cost. Both of these are testable hypotheses. I'm not a
compiler dev nor a libc dev, I just want to make sure that my compiler and my
libc use the same contract when talking to each other -- but I hope someone who
is a compiler dev or a libc dev can go and actually test these hypotheses,
rather than just speculating about it as has been happening so far.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #40 from Sam James  ---
https://git.musl-libc.org/cgit/musl/tree/src/string/memcpy.c#n5

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread public at timruffing dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #39 from Tim Ruffing  ---
(In reply to Rich Felker from comment #36)

> Our memcpy is not written in asm but in C, and it has the restrict qualifier
> on src and dest. 

For my curiosity, can you point me to this implementation? I couldn't find a
memcpy with restrict in the libc source, but I suspect that's just because I'm
not familiar with the code base.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread public at timruffing dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #38 from Tim Ruffing  ---
(In reply to Rich Felker from comment #36)
> If you're happy with a branch, you could probably take restrict off the
> arguments and do something like:
> 
> if (src==dest) return;
> const char *restrict src2 = src;
> char *restrict dest2 = dest;
> ...
> 


I'm not saying that such an implementation will be a good idea, but just a
remark: You could, in fact, keep restrict for the arguments in this case,
because the object pointed to by src and dest is not accessed at all when
src==dest. So this is correct code according to the standard. (The exact
semantics of restrict are a bit involved...)

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #37 from Rich Felker  ---
Also: has anyone even looked at what happens if a C memcpy with proper use of
restrict gets LTO-inlined into a caller with GCC-generated memcpy calll where
src==dest? That sounds like a case likely to blow up...

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #36 from Rich Felker  ---
> the assembly generated by the current implementations already supports that 
> case.

Our memcpy is not written in asm but in C, and it has the restrict qualifier on
src and dest. This entitles a compiler to emit asm equivalent to if (src==dest)
system("rm -rf /") if it likes. I don't know how you can write a valid C
implementation of memcpy that "doesn't care" about 100% overlap without giving
up restrict (and the benefits it entails) entirely.

If you're happy with a branch, you could probably take restrict off the
arguments and do something like:

if (src==dest) return;
const char *restrict src2 = src;
char *restrict dest2 = dest;
...

but that's shoving the branch into memcpy where it's a cost on every caller
making dynamic memcpys with potentially tiny size (like qsort, etc.) and
obeying the contract not to call with overlapping src/dest, rather than just
imposing it on bad callers.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #35 from Jakub Jelinek  ---
Note, clang makes the same assumption apparently (while MSVC emits rep movs
inline and
ICC either that, or calls _intel_fast_memcpy).  As I mentioned earlier, if
glibc and other libraries provide an alias to memcpy or memmove (whichever
guarantees that exact overlap will be handled right in the implementation), GCC
could start using that instead of memcpy for those cases.  And it would be up
to the libraries to decide which implementation satisfies that.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #34 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:7758cb4b53e8a33642709402ce582f769eb9fd18

commit r14-5772-g7758cb4b53e8a33642709402ce582f769eb9fd18
Author: Richard Biener 
Date:   Thu Nov 23 08:54:56 2023 +0100

middle-end/32667 - document cpymem and memcpy exact overlap requirement

The following amends the cpymem documentation to mention that exact
overlap needs to be handled gracefully, also noting that the target
runtime is expected to behave the same way where -ffreestanding
docs mention the set of routines required.

PR middle-end/32667
* doc/md.texi (cpymem): Document that exact overlap of source
and destination needs to work.
* doc/standards.texi (ffreestanding): Mention memcpy is required
to handle the exact overlap case.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #33 from rguenther at suse dot de  ---
On Thu, 23 Nov 2023, fw at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667
> 
> --- Comment #32 from Florian Weimer  ---
> There's this in standards.texi:
> 
> Most of the compiler support routines used by GCC are present in
> @file{libgcc}, but there are a few exceptions.  GCC requires the
> freestanding environment provide @code{memcpy}, @code{memmove},
> @code{memset} and @code{memcmp}.
> Finally, if @code{__builtin_trap} is used, and the target does
> not implement the @code{trap} pattern, then GCC emits a call
> to @code{abort}.
> 
> Maybe that would be a place to mention this issue, too?

Will add.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #32 from Florian Weimer  ---
There's this in standards.texi:

Most of the compiler support routines used by GCC are present in
@file{libgcc}, but there are a few exceptions.  GCC requires the
freestanding environment provide @code{memcpy}, @code{memmove},
@code{memset} and @code{memcmp}.
Finally, if @code{__builtin_trap} is used, and the target does
not implement the @code{trap} pattern, then GCC emits a call
to @code{abort}.

Maybe that would be a place to mention this issue, too?

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #31 from Richard Biener  ---
(In reply to Patrick J. LoPresti from comment #29)
> (In reply to Jakub Jelinek from comment #27)
> > 
> > No, that is not a reasonable fix, because it severely pessimizes common code
> > for a theoretical only problem.
> 
> The very existence of (and interest in) this bug report means it is
> obviously not "a theoretical only problem".
> 
> And of course Rich Felker is correct that the cost of the obvious fix is
> trivial and not remotely "severe".

But I didn't see a patch proposed to address this issue, which means it
doesn't seem to be trivial.

> But the bottom line is that GCC is emitting library calls that invoke
> undefined behavior. At a minimum, GCC should document this non-standard
> requirement on its runtime environment. Has anyone bothered to do that? Why
> not?

I think it's written down somewhere but I can't quickly find it (I also
wonder where exactly the best place to document would be - it's related
to porting GCC to a new target architecture I guess, not so much user-facing).

OTOH I see

@cindex @code{cpymem@var{m}} instruction pattern 
@item @samp{cpymem@var{m}}
...
The @code{cpymem@var{m}} patterns need not give special consideration
to the possibility that the source and destination strings might
overlap. These patterns are used to do inline expansion of
@code{__builtin_memcpy}.

which is possibly the closest piece we have and which fails to mention
exact overlap.  I'll propose an adjustment to this.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #30 from post+gcc at ralfj dot de ---
There have been several assertions above that a certain way to solve this
either has no performance cost at all or severe performance cost. That sounds
like we are missing data -- ideally, someone would benchmark the actual cost of
emitting that branch. It seems kind of pointless to just make assertions about
the impact of this change without real data.

> On the other hand, expecting the libc memcpy to make this check greatly 
> pessimizes every reasonable small use of memcpy with a gratuitous branch for 
> what is undefined behavior and should never appear in any valid program.

I don't think this is true. As far as I can see, the performance impact of
having memcpy support the src==dest case is zero -- the assembly generated by
the current implementations already supports that case. (At least I have not
seen any evidence to the contrary.) No new check in memcpy is required.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread lopresti at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #29 from Patrick J. LoPresti  ---
(In reply to Jakub Jelinek from comment #27)
> 
> No, that is not a reasonable fix, because it severely pessimizes common code
> for a theoretical only problem.

The very existence of (and interest in) this bug report means it is obviously
not "a theoretical only problem".

And of course Rich Felker is correct that the cost of the obvious fix is
trivial and not remotely "severe".

But the bottom line is that GCC is emitting library calls that invoke undefined
behavior. At a minimum, GCC should document this non-standard requirement on
its runtime environment. Has anyone bothered to do that? Why not?

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #28 from Rich Felker  ---
> No, that is not a reasonable fix, because it severely pessimizes common code 
> for a theoretical only problem.

Far less than a call to memmove (which necessarily has something comparable to
that and other unnecessary branches) pessimizes it.

I also disagree that it's severe. On basically any machine with branch
prediction, the branch will be predicted correctly all the time and has
basically zero cost. On the other hand, the branches in memmove could go
different ways depending on the caller, so it's much more
machine-capability-dependent whether they can be predicted.

In some sense the optimal thing to do is "nothing", just assuming it would be
hard to write a memcpy that fails on src==dest. However, at the very least this
precludes hardened memcpy trapping on src==dest, which might be a useful
hardening feature (or rather on a range test for overlapping, which would
happen to also catch exact overlap). So it would be nice if it were fixed.

FWIW, I don't think single branches are relevant to overall performance in
cases where the compiler is doing something reasonable by emitting a call to
memcpy to implement assignment. If the object is small enough that the branch
is relevant, the call overhead is even more of a big deal, and it should be
inlining loads/stores to perform the assignment.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #27 from Jakub Jelinek  ---
(In reply to Rich Felker from comment #26)
> > The only reasonable fix on the compiler side is to never emit memcpy but 
> > always use memmove.
> 
> No, it can literally just emit (equivalent at whatever intermediate form of):
> 
> cmp src,dst
> je 1f
> call memcpy
> 1:
> 
> in place of memcpy.

No, that is not a reasonable fix, because it severely pessimizes common code
for a theoretical only problem.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #26 from Rich Felker  ---
> The only reasonable fix on the compiler side is to never emit memcpy but 
> always use memmove.

No, it can literally just emit (equivalent at whatever intermediate form of):

cmp src,dst
je 1f
call memcpy
1:

in place of memcpy.

It can even optimize out that in the case where it's provable that they're not
equal, e.g. presence of restrict or one of the two objects not having had its
address taken/leaked.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-21 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #25 from rguenther at suse dot de  ---
On Tue, 21 Nov 2023, bugdal at aerifal dot cx wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667
> 
> Rich Felker  changed:
> 
>What|Removed |Added
> 
>  CC||bugdal at aerifal dot cx
> 
> --- Comment #24 from Rich Felker  ---
> If the copy is such that gcc is happy to emit an external call to memcpy for
> it, there is no significant size or performance cost to emitting a branch
> checking for equality before making the call, and performing this branch would
> greatly optimize the (maybe rare in the caller, maybe not) case of
> self-assignment!
> 
> On the other hand, expecting the libc memcpy to make this check greatly
> pessimizes every reasonable small use of memcpy with a gratuitous branch for
> what is undefined behavior and should never appear in any valid program.
> 
> Fix it on the compiler side please.

The only reasonable fix on the compiler side is to never emit memcpy
but always use memmove.  The pessimization of that compared to the
non-existing "real" issue with calling memcpy with exact overlap
is the reason for the non-action.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-21 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #24 from Rich Felker  ---
If the copy is such that gcc is happy to emit an external call to memcpy for
it, there is no significant size or performance cost to emitting a branch
checking for equality before making the call, and performing this branch would
greatly optimize the (maybe rare in the caller, maybe not) case of
self-assignment!

On the other hand, expecting the libc memcpy to make this check greatly
pessimizes every reasonable small use of memcpy with a gratuitous branch for
what is undefined behavior and should never appear in any valid program.

Fix it on the compiler side please.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-10-28 Thread post+gcc at ralfj dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

post+gcc at ralfj dot de changed:

   What|Removed |Added

 CC||post+gcc at ralfj dot de

--- Comment #23 from post+gcc at ralfj dot de ---
>  Is glibc community ready to provide such guarantee?

This is indeed a key question here I think. Currently GCC makes assumptions
that *even the libc produced by the same project* does not document as stable
guarantees. That's rather dissonant. The GNU project should at least within
itself come to a proper conclusion on the question of whether memcpy should be
UB or not when both pointers are equal. Right now we have everyone pointing at
everyone else, and users are left in the rain with their valgrind errors.

Ideally of course the C standard would be updated to ensure that slowly but
steadily, the memcpy contract is updated to match reality. That will take a
while, but given that this issue was filed 16 years ago (!), there clearly
would have been enough time. (If someone does, please join forces with the
clang people that are interested in getting C updated:
https://reviews.llvm.org/D86993#4585590).

But GNU controls glibc so there's not really any excuse for not updating those
docs, I think? glibc making such a move would be a great step towards
convincing valgrind and the C committee that memcpy should have defined
behavior when both pointers are equal.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-01-06 Thread lopresti at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #22 from Patrick J. LoPresti  ---
I disagree that bug 108296 is a duplicate. That bug requires code that, at
least arguably, invokes undefined behavior. See e.g.
https://stackoverflow.com/q/7292862/ and https://stackoverflow.com/q/61070828/.

This bug is about clearly valid C++ code (object self-assignment) for which GCC
emits clearly invalid calls to memcpy() (with dest == src).

Now, I suspect what Andrew is thinking is that both of these bugs could be
resolved by invoking memmove() instead of memcpy(). That seems like a
reasonable idea to me, since small assignments get inlined and large
assignments can amortize the overhead.

But this bug could also be resolved in other ways, a few of which have been
suggested in these comments.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-01-05 Thread nyh at math dot technion.ac.il via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #21 from Nadav Har'El  ---
This old problem has become a real problem in gcc 12 with a real effect on
incorrect code generation, where code that copies an object was incorrectly
"optimized" to use __builtin_memcpy() instead of __builtin_memmove() even
though the source and destination objects may overlap - and it turns out that
__builtin_memcpy() may produce incorrect results for overlapping addresses.

This bug was discovered in the OSv project
https://github.com/cloudius-systems/osv/issues/1212 with code that doesn't
(obviously) call __builtin_memcpy() directly, but rather had a 27-character
type being copied and the compiler implemented this copy with a call to
__builtin_memcpy(). Here is an example of code which generates the wrong
results (note the missing "c" in the result and unexpectedly doubled "g"):

#include 
int main(){
char buf[128] = "0123456789abcdefghijklmnopqrstuvwxyz";
struct [[gnu::packed]] data {
char x[27];
};
void *p0 = buf;
void *p1 = [1];
*static_cast(p0) = *static_cast(p1);
printf("%s", buf);
}

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

Andrew Pinski  changed:

   What|Removed |Added

 CC||nyh at math dot technion.ac.il

--- Comment #20 from Andrew Pinski  ---
*** Bug 108296 has been marked as a duplicate of this bug. ***

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2021-06-10 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #19 from rguenther at suse dot de  ---
On Wed, 9 Jun 2021, public at timruffing dot de wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667
> 
> Tim Ruffing  changed:
> 
>What|Removed |Added
> 
>  CC||public at timruffing dot de
> 
> --- Comment #18 from Tim Ruffing  ---
> Is there still interest in fixing this? This still leads to spurious Valgrind
> warnings in the real world.

I don't think this is ever going to be "fixed" on the GCC side, instead
maybe valgrind should be at least given an option to ignore those cases.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2021-06-09 Thread public at timruffing dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

Tim Ruffing  changed:

   What|Removed |Added

 CC||public at timruffing dot de

--- Comment #18 from Tim Ruffing  ---
Is there still interest in fixing this? This still leads to spurious Valgrind
warnings in the real world.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2016-04-25 Thread ch3root at openwall dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #17 from Alexander Cherepanov  ---
On 2016-04-25 17:34, joseph at codesourcery dot com wrote:
> Or it could simply document an additional requirement on the C library
> used with GCC (that the case of exact overlap works), just as it documents
> the requirement that some functions such as memcpy be provided even in the
> freestanding case.  The compiler and library implementations need to
> cooperate to produce a conforming C implementation.

1. Is glibc community ready to provide such guarantee? The doc[1] 
doesn't currently mention an exception for exact overlap.

[1] 
https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memcpy

2. I agree that it should be documented. Together with the assumption 
that memcpy doesn't clobber errno.

3. The decision affects the whole Free Software community. If memcpy 
with exact overlap inserted be gcc are legitimized then users cannot use 
tools like valgrind to catch such cases in their own code. (And neither 
_FORTIFY_SOURCE nor UBSan seem to catch overlapping memcpy now.) So it 
effectively leads to legitimizing such memcpy in user's code too. Making 
memcpy closer to a simple assignment (C11, 6.5.16.1p3) could be a good 
thing. Maybe even fix C std or POSIX?


BTW llvm bug is here: https://llvm.org/bugs/show_bug.cgi?id=11763 . It 
also contains links to valgrind discussions.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2016-04-25 Thread lopresti at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #16 from Patrick J. LoPresti  ---
Well, Valgrind itself is another real-world example. Tools like Valgrind cannot
distinguish invalid memcpy() calls by the programmer from invalid memcpy()
calls emitted by GCC.

You can, of course, redefine every standard interface you like and simply
document it. That would be awful engineering -- as in this case -- but you
could do it.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2016-04-25 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #15 from joseph at codesourcery dot com  ---
On Sun, 24 Apr 2016, lopresti at gmail dot com wrote:

> That said, this is clearly a real bug in GCC. memcpy has a well-defined
> interface; GCC emits calls violating that interface; therefore GCC is buggy. I
> do not see why this is even controversial. (Also see Julian Seward's
> description of real-world problems in comment 5.)

As far as I know, that does not describe real-world problems on any target 
supported by GCC.

> GCC either needs to provide its own mempcy or it needs to respect the
> interface. The cost of the latter is surely trivial, especially since many
> cases could be optimized by alias analysis.

Or it could simply document an additional requirement on the C library 
used with GCC (that the case of exact overlap works), just as it documents 
the requirement that some functions such as memcpy be provided even in the 
freestanding case.  The compiler and library implementations need to 
cooperate to produce a conforming C implementation.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2016-04-24 Thread lopresti at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #14 from Patrick J. LoPresti  ---
D Hugh Redelmeier in comment 12 is mistaken... memcpy is a reserved identifier
(see e.g. http://stackoverflow.com/a/23841970), so the user cannot legally
redefine it.

That said, this is clearly a real bug in GCC. memcpy has a well-defined
interface; GCC emits calls violating that interface; therefore GCC is buggy. I
do not see why this is even controversial. (Also see Julian Seward's
description of real-world problems in comment 5.)

GCC either needs to provide its own mempcy or it needs to respect the
interface. The cost of the latter is surely trivial, especially since many
cases could be optimized by alias analysis.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2016-04-24 Thread ch3root at openwall dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

Alexander Cherepanov  changed:

   What|Removed |Added

 CC||ch3root at openwall dot com

--- Comment #13 from Alexander Cherepanov  ---
This bug could be reproduced with gcc 7.0.0 20160424 on x86-64 with this
example:

int main()
{
  volatile struct {
char s[8193]; // gcc
//char s[129]; // clang
  } s = {""};

  s = s;
}

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2015-09-28 Thread hugh at mimosa dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

D. Hugh Redelmeier  changed:

   What|Removed |Added

 CC||hugh at mimosa dot com

--- Comment #12 from D. Hugh Redelmeier  ---
JJ in #11 is right.

The compiler should never generate calls to functions with names in the user's
space.  That's because the user is allowed to (re)define anything with those
names and is not constrained to preserve expected semantics.

There should be a function in the reserved-to-system namespace that does what
is needed.  GCC should then call this function.

Let's say that GCC uses __memcpy.  The default standard definition of memcpy
could be a synonym (until redefinition) for __memcpy.

On the other hand, there are compile-time constraints on struct assignments
that could yield even better performance.  For example, GCC might know that the
object is a multiple of 8 bytes, aligned on an 8-byte boundary.  GCC would
certainly know that the struct's length is non-zero, conceptually eliminating
one test.  This suggests implementing specialized variants, perhaps having
names starting with __struct_assign.

(I know GCC has an extension allowing 0 length objects.  The assignment for
such objects could be eliminated.)


[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2015-02-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org ---
And another way to fix this is add a new library entrypoint, that would have
the same semantics as memcpy, but would allow full overlap (dst == src).
When we e.g. expand this without a library call, we could do exactly what we do
for memcpy, because we know we do handle the dst == src case fine.  Similarly,
e.g. in glibc it could very well be just another alias to memcpy, because we
know it handles that too.  On targets which would not have the library function
we'd use the #c10 approach.  Of course, this would require coordination in
between glibc, gcc, valgrind, libsanitizer, memstomp etc.


[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2015-02-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #10 from Richard Biener rguenth at gcc dot gnu.org ---
One way to fix this is to emit the memcpy as

  if (p != q)
memcpy (p, q, ...);

but of course that comes at a cost in code-size and runtime for no obvious good
reason (in practice).


[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2015-02-11 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

Andrew Pinski pinskia at gcc dot gnu.org changed:

   What|Removed |Added

 CC||msebor at gmail dot com

--- Comment #9 from Andrew Pinski pinskia at gcc dot gnu.org ---
*** Bug 65029 has been marked as a duplicate of this bug. ***


[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2011-12-06 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

Summary|builtin operator= generates |block copy with exact
   |memcpy with overlapping |overlap is expanded as
   |memory regions  |memcpy
  Known to fail||4.7.0

--- Comment #8 from Richard Guenther rguenth at gcc dot gnu.org 2011-12-06 
08:44:46 UTC ---
This is a generic middle-end issue (at best).  There is at least one duplicate
of this bug (can't find it with a quick search).