[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #21 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Fri May 23 10:11:03 2014 New Revision: 210848 URL: http://gcc.gnu.org/viewcvs?rev=210848root=gccview=rev Log: 2014-05-22 Paul Eggert egg...@cs.ucla.edu PR other/56955 * doc/extend.texi (Function Attributes): Fix __attribute__ ((malloc)) documentation; the old documentation didn't clearly state the constraints on the contents of the pointed-to storage. Modified: trunk/gcc/ChangeLog trunk/gcc/doc/extend.texi
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #22 from Richard Biener rguenth at gcc dot gnu.org --- Fixed.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 Paul Eggert eggert at gnu dot org changed: What|Removed |Added Attachment #32832|0 |1 is obsolete|| --- Comment #20 from Paul Eggert eggert at gnu dot org --- Created attachment 32844 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32844action=edit Revised-again doc patch for __attribute__ ((malloc)) Patch revised based on further comments on gcc-patches.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-05-21 CC||rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #15 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Dan Gohman from comment #4) (In reply to comment #3) Well, it _is_ actually about the content. There must be no way to compute a valid pointer to another object from the contents of the pointed-to memory. Oh wow. That's a subtlety that completely escaped me. So if you initialize the memory to {0, 1, 2, 3, 4 ...} thus every possible byte value is somewhere and then do void *p = (void *)(mem[3] 24 | mem[58] 16 | ...); then points-to analysis assumes that from the contents of 'mem' you can only compute pointers to nothing (NULL). Is that example fundamentally different than something like this: void *q = (void *)(mem[0] + 0xb1ab1ab1a); In both cases, the information of the pointer value is in the expression, not in the memory. It's exactly the same. Is it the case that the memory must be either actually zeros or uninitialized? Or could it contain other data which merely transmits no information about pointer values? There must be no way to compute a pointer to an object by _just_ combining bits and bytes of that memory cleverly. So for example initializing the memory to -1 (all bits set) would not work as you can compute a zero from 1 ^ 1 and thus any possible pointer value. That's not possible for zero. Oh wait, you can do ~0. Hmm ... subtle ;) Ok, we ignore pointer values computed from FP values as well, thus I guess technically only undefined content is really really valid. Practically memory with non-pointer values is ok unless you play evil (or very evil) games outlined above. Technically for targets where NULL is a valid poiner to an object calloc () may not be marked with malloc. That is, read it in the way that the code assumes the memory _may_ be zero-initialized (but only zero-initialized) or uninitialized. If this is what it means, then I request that the text be updated to say this. I'd be willing to propose a wording, once I understand the intent, if that'd be helpful. What should we say about the fact that GLIBC uses the malloc attribute on strdup (and similar things)? strdup actually could be used to transmit information about pointer values. True. See above though. Note that the actual implementation (as opposed to what would be allowed by the documentation) does: /* If this is not a real malloc call assume the memory was initialized and thus may point to global memory. All builtin functions with the malloc attribute behave in a sane way. */ if (!fndecl || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL) make_constraint_from (vi, nonlocal_id); and thus restricts this to known functions (not only malloc, but also strdup which we just expect you don't use to transfer pointers ...). As of 'realloc' - yes we can special-case that in the compiler (we don't do that), but we can't really re-use the existing 'malloc' attribute for that. The proposed revised documentation looks like a good improvement to me, Paul, can you post it to gcc-patches@?
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #16 from Richard Biener rguenth at gcc dot gnu.org --- One reason for why realloc is hard is that there is no language that says it is undefined to access the object via the old pointer, but there is only language that says the old and the new pointer values may be equal. Thus, void foo (int *p) { int *q = realloc (p, sizeof (int)); *q = 2; } may I remove the store *q = 2 as dead? The new pointer doesn't escape and thus nothing can read from it ...
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #17 from Dan Gohman sunfish at mozilla dot com --- (In reply to Richard Biener from comment #16) One reason for why realloc is hard is that there is no language that says it is undefined to access the object via the old pointer, but there is only language that says the old and the new pointer values may be equal. C89 was unclear, but C99 and now C11 7.22.3.5 say realloc deallocates the old pointer, and there is no mention of the case where the pointers happen to be equal. The interpretation of this to mean that old and new pointers don't alias, even when being comparison-equal, has a serious following.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #18 from Paul Eggert eggert at gnu dot org --- (In reply to Richard Biener from comment #16) void foo (int *p) { int *q = realloc (p, sizeof (int)); *q = 2; } may I remove the store *q = 2 as dead? Yes, the consensus nowadays is that you can. I'll be happy to send the proposed change to gcc-patches but would like to be sure it's correct first. Has this new information about realloc changed your opinion about whether realloc can be given the malloc attribute?
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #19 from rguenther at suse dot de rguenther at suse dot de --- On May 21, 2014 5:14:27 PM CEST, eggert at gnu dot org gcc-bugzi...@gcc.gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #18 from Paul Eggert eggert at gnu dot org --- (In reply to Richard Biener from comment #16) void foo (int *p) { int *q = realloc (p, sizeof (int)); *q = 2; } may I remove the store *q = 2 as dead? Yes, the consensus nowadays is that you can. I'll be happy to send the proposed change to gcc-patches but would like to be sure it's correct first. Has this new information about realloc changed your opinion about whether realloc can be given the malloc attribute? No, it has not. Richard.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 Paul Eggert eggert at gnu dot org changed: What|Removed |Added CC||eggert at gnu dot org --- Comment #6 from Paul Eggert eggert at gnu dot org --- Created attachment 32831 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32831action=edit Clarify documentation for __attribute__ ((malloc)). This topic recently came up on the glibc mailing list and there's clearly a lot of confusion about it. See, for example, https://sourceware.org/ml/libc-alpha/2014-05/msg00519.html. Attaching a proposed patch to the documentation to try to help clear this up.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 Carlos O'Donell carlos at redhat dot com changed: What|Removed |Added CC||carlos at redhat dot com --- Comment #7 from Carlos O'Donell carlos at redhat dot com --- (In reply to Paul Eggert from comment #6) Created attachment 32831 [details] Clarify documentation for __attribute__ ((malloc)). This topic recently came up on the glibc mailing list and there's clearly a lot of confusion about it. See, for example, https://sourceware.org/ml/libc-alpha/2014-05/msg00519.html. Attaching a proposed patch to the documentation to try to help clear this up. s/Ussing/Using/g, otherwise the patch in attachment #32831 looks great to me.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #8 from Paul Eggert eggert at gnu dot org --- Comment on attachment 32831 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32831 Clarify documentation for __attribute__ ((malloc)). Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 210629) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,10 @@ +2014-05-20 Paul Eggert egg...@cs.ucla.edu + + PR other/56955 + * doc/extend.texi (Function Attributes): Fix __attribute__ ((malloc)) + documentation; the old documentation didn't clearly state the + constraints on the contents of the pointed-to storage. + 2014-05-19 David Wohlferd d...@limegreensocks.com * doc/extend.texi: Create Label Attributes section, Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi(revision 210629) +++ gcc/doc/extend.texi(working copy) @@ -3207,15 +3207,20 @@ @item malloc @cindex @code{malloc} attribute -The @code{malloc} attribute is used to tell the compiler that a function -may be treated as if any non-@code{NULL} pointer it returns cannot -alias any other pointer valid when the function returns and that the memory -has undefined content. -This often improves optimization. -Standard functions with this property include @code{malloc} and -@code{calloc}. @code{realloc}-like functions do not have this -property as the memory pointed to does not have undefined content. +This tells the compiler that a function is @code{malloc}-like, i.e., +that if the function returns a non-null pointer @var{P}, then @var{P} +cannot alias any other pointer valid when the function returns, and +moreover the contents of any storage addressed by @var{P} cannot +contain a pointer that aliases any other pointer valid when the +function returns. +Ussing this attribute often improves optimization. Functions like +@code{malloc} and @code{calloc} have this property because they return +a pointer to uninitialized or zeroed-out storage. However, functions +like @code{realloc} do not have this property, as they can return a +pointer to storage containing pointers that alias already-valid +pointers. + @item mips16/nomips16 @cindex @code{mips16} attribute @cindex @code{nomips16} attribute
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 Paul Eggert eggert at gnu dot org changed: What|Removed |Added Attachment #32831|0 |1 is obsolete|| --- Comment #9 from Paul Eggert eggert at gnu dot org --- Created attachment 32832 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32832action=edit Revised documentation patch for __attribute__ ((malloc)) Thanks for the quick review. Revised patch attached. Sorry about the noise in my previous reply, I hit Submit by accident.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #10 from Rich Felker bugdal at aerifal dot cx --- I don't see how it's at all helpful for GCC to assume that memory obtained by __attribute__((__malloc__)) functions does not contain pointers to anything that existed before the call. This assumption only aids optimization in the case where a pointer residing in the obtained memory is used (e.g. dereferenced or compared with another pointer) before anything is stored to it. But with GCC's assumption, such use would be UB anyway and thus cannot occur in a correct program, so there's no sense in optimizing it. The alternative is much more reasonable: assume that a pointer residing in the obtained memory could alias any object whose address has already escaped (roughly, anything but automatic or static/internal-linkage objects whose addresses were not taken and passed to code the compiler can't see). This allows __attribute__((__malloc__)) to be applied to realloc-like functions as well as functions in third-party libraries which allocate non-opaque structures whose members may point to data that's also accessible via other paths. And as far as I can tell, it doesn't preclude any optimizations that could take place in a code path that doesn't invoke UB.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #11 from Paul Eggert eggert at gnu dot org --- Created attachment 32834 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32834action=edit Sample illustrating GCC's optimization with __attribute__ ((malloc))
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #12 from Paul Eggert eggert at gnu dot org --- (In reply to Rich Felker from comment #10) This assumption only aids optimization in the case where a pointer residing in the obtained memory is used (e.g. dereferenced or compared with another pointer) before anything is stored to it. No, it also aids optimization because GCC can infer lack of aliasing elsewhere, even if no pointer in the newly allocated memory is used-before-set. Consider the contrived example am.c (which I've added as an attachment to this report). It has two functions f and g that differ only in that f calls m which has __attribute__ ((malloc)) whereas g calls n which does not. With the weaker assumption you're suggesting, GCC could not optimize away the reload from a-next in f, because of the intervening assignment '*p = q'. I've compiled this with both GCC 4.9.0 and Clang 3.4 on x86-64 with -O2. Both compile g to essentially the same 15 instructions. Clang, which I suspect uses the weaker assumption, compiles f to 14 instructions; GCC compiles f to 11 instructions.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #13 from Dan Gohman sunfish at mozilla dot com --- (In reply to Paul Eggert from comment #12) (In reply to Rich Felker from comment #10) This assumption only aids optimization in the case where a pointer residing in the obtained memory is used (e.g. dereferenced or compared with another pointer) before anything is stored to it. No, it also aids optimization because GCC can infer lack of aliasing elsewhere, even if no pointer in the newly allocated memory is used-before-set. Consider the contrived example am.c (which I've added as an attachment to this report). It has two functions f and g that differ only in that f calls m which has __attribute__ ((malloc)) whereas g calls n which does not. With the weaker assumption you're suggesting, GCC could not optimize away the reload from a-next in f, because of the intervening assignment '*p = q'. Actually, GCC and Clang both eliminate the reload of a-next in f (and not in g). The weaker assumption is sufficient for that. *p can't alias a or b without violating the weaker assumption. What GCC is additionally doing in f is deleting the stores to a-next and b-next as dead stores. That's really clever. However, the weaker assumption is actually sufficient for that too: First, forward b to eliminate the load of a-next. Then, it can be proved that a doesn't escape, and is defined by an attribute malloc function, so the stores through it can't be loaded anywhere else, so they're dead. Then, it can be proved that b doesn't escape either, and is also defined by an attribute malloc function, so the stores through it are dead too. Consequently, the weaker assumption is still fairly strong. Further, the weaker assumption would be usable by a much broader set of functions, so it may even provide overall stronger alias information in practice.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #14 from Paul Eggert eggert at gnu dot org --- (In reply to Dan Gohman from comment #13) *p can't alias a or b without violating the weaker assumption. Sorry, you've lost me there. Pointers in realloc'ed storage can alias already-existing pointers, and surely this aliasing can inhibit optimization; perhaps my example doesn't show this correctly but I expect that other examples could. the weaker assumption would be usable by a much broader set of functions, so it may even provide overall stronger alias information in practice. From what I've seen so far I tend to agree, but there are contrary opinions, e.g., Richard Biener's We cannot use the malloc attribute on realloc, ever. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383#23. It might be helpful for Richard to chime in here and explain. Even if GCC does not change the meaning of __attribute__ ((malloc)), perhaps there should be a new directive (__attribute__ ((new)), say) with the weaker semantics. Anyway, the original bug report is about GCC's documentation, and the proposed patch does fix the documentation bug, so we could start with it.
[Bug other/56955] documentation for attribute malloc contradicts itself
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 davidxl at google dot com changed: What|Removed |Added CC||davidxl at google dot com --- Comment #5 from davidxl at google dot com 2013-04-26 19:16:31 UTC --- The documentation gives the most strict semantic for the attribute where calloc is not qualified if NULL is a valid pointer. However GCC's implementation for the attribute is more relaxed but pessimizes simple cases: http://gcc.gnu.org/ml/gcc-patches/2010-02/msg00381.html David (In reply to comment #3) (In reply to comment #2) (In reply to comment #1) I think it is talking about the memory returned by malloc/calloc will not point to another memory location while realloc can. I agree that's essentially what it ought to talk about, and the bug is that it's talking about something else -- the contents of the pointed-to memory. Well, it _is_ actually about the content. There must be no way to compute a valid pointer to another object from the contents of the pointed-to memory. So if you initialize the memory to {0, 1, 2, 3, 4 ...} thus every possible byte value is somewhere and then do void *p = (void *)(mem[3] 24 | mem[58] 16 | ...); then points-to analysis assumes that from the contents of 'mem' you can only compute pointers to nothing (NULL). Technically for targets where NULL is a valid poiner to an object calloc () may not be marked with malloc. That is, read it in the way that the code assumes the memory _may_ be zero-initialized (but only zero-initialized) or uninitialized.
[Bug other/56955] documentation for attribute malloc contradicts itself
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #3 from Richard Biener rguenth at gcc dot gnu.org 2013-04-15 10:19:22 UTC --- (In reply to comment #2) (In reply to comment #1) I think it is talking about the memory returned by malloc/calloc will not point to another memory location while realloc can. I agree that's essentially what it ought to talk about, and the bug is that it's talking about something else -- the contents of the pointed-to memory. Well, it _is_ actually about the content. There must be no way to compute a valid pointer to another object from the contents of the pointed-to memory. So if you initialize the memory to {0, 1, 2, 3, 4 ...} thus every possible byte value is somewhere and then do void *p = (void *)(mem[3] 24 | mem[58] 16 | ...); then points-to analysis assumes that from the contents of 'mem' you can only compute pointers to nothing (NULL). Technically for targets where NULL is a valid poiner to an object calloc () may not be marked with malloc. That is, read it in the way that the code assumes the memory _may_ be zero-initialized (but only zero-initialized) or uninitialized.
[Bug other/56955] documentation for attribute malloc contradicts itself
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #4 from Dan Gohman sunfish at google dot com 2013-04-15 14:53:06 UTC --- (In reply to comment #3) Well, it _is_ actually about the content. There must be no way to compute a valid pointer to another object from the contents of the pointed-to memory. Oh wow. That's a subtlety that completely escaped me. So if you initialize the memory to {0, 1, 2, 3, 4 ...} thus every possible byte value is somewhere and then do void *p = (void *)(mem[3] 24 | mem[58] 16 | ...); then points-to analysis assumes that from the contents of 'mem' you can only compute pointers to nothing (NULL). Is that example fundamentally different than something like this: void *q = (void *)(mem[0] + 0xb1ab1ab1a); In both cases, the information of the pointer value is in the expression, not in the memory. Is it the case that the memory must be either actually zeros or uninitialized? Or could it contain other data which merely transmits no information about pointer values? Technically for targets where NULL is a valid poiner to an object calloc () may not be marked with malloc. That is, read it in the way that the code assumes the memory _may_ be zero-initialized (but only zero-initialized) or uninitialized. If this is what it means, then I request that the text be updated to say this. I'd be willing to propose a wording, once I understand the intent, if that'd be helpful. What should we say about the fact that GLIBC uses the malloc attribute on strdup (and similar things)? strdup actually could be used to transmit information about pointer values.
[Bug other/56955] documentation for attribute malloc contradicts itself
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org 2013-04-14 19:05:00 UTC --- I think it is talking about the memory returned by malloc/calloc will not point to another memory location while realloc can.
[Bug other/56955] documentation for attribute malloc contradicts itself
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #2 from Dan Gohman sunfish at google dot com 2013-04-14 19:47:42 UTC --- (In reply to comment #1) I think it is talking about the memory returned by malloc/calloc will not point to another memory location while realloc can. I agree that's essentially what it ought to talk about, and the bug is that it's talking about something else -- the contents of the pointed-to memory.