Re: Missed warning (-Wuse-after-free)

2023-02-24 Thread Serge E. Hallyn
On Fri, Feb 24, 2023 at 02:42:32AM +0100, Alex Colomar wrote:
> Hi Serge, Martin,
> 
> On 2/24/23 02:21, Serge E. Hallyn wrote:
> > > Does all this imply that the following is well defined behavior (and shall
> > > print what one would expect)?
> > > 
> > >free(p);
> > > 
> > >(void) &p;  // take the address
> > >// or maybe we should (void) memcmp(&p, &p, sizeof(p)); ?
> > > 
> > >printf("%p\n", p);  // we took previously its address,
> > >// so now it has to hold consistently
> > >// the previous value
> > > 
> > > 
> > > This feels weird.  And a bit of a Schroedinger's pointer.  I'm not 
> > > entirely
> > > convinced, but might be.
> > 
> > Again, p is just an n byte variable which happens to have (one hopes)
> > pointed at a previously malloc'd address.
> > 
> > And I'd argue that pre-C11, this was not confusing, and would not have
> > felt weird to you.
> > 
> > But I am most grateful to you for having brought this to my attention.
> > I may not agree with it and not like it, but it's right there in the
> > spec, so time for me to adjust :)
> 
> I'll try to show why this feels weird to me (even in C89):
> 
> 
> alx@dell7760:~/tmp$ cat pointers.c
> #include 
> #include 
> 
> 
> int
> main(void)
> {
>   char  *p, *q;
> 
>   p = malloc(42);
>   if (p == NULL)
>   exit(1);
> 
>   q = realloc(p, 42);
>   if (q == NULL)
>   exit(1);
> 
>   (void) &p;  // If we remove this, we get -Wuse-after-free

(which I would argue is a bug in the compiler)

>   printf("(%p == %p) = %i\n", p, q, (p == q));
> }
> alx@dell7760:~/tmp$ cc -Wall -Wextra pointers.c  -Wuse-after-free=3
> alx@dell7760:~/tmp$ ./a.out
> (0x5642cd9022a0 == 0x5642cd9022a0) = 1
> 
> 
> This pointers point to different objects (actually, one of them doesn't even
> point to an object anymore), so they can't compare equal, according to both:
> 
> <http://port70.net/%7Ensz/c/c11/n1570.html#6.5.9p6>
> 
> <http://port70.net/~nsz/c/c89/c89-draft.html#3.3.9>
> 
> (I believe C89 already had the concept of lifetime well defined as it is
> now, so the object had finished it's lifetime after realloc(3)).
> 
> How can we justify that true, if the pointer don't point to the same object?

Because what's pointed to does not matter.

You are comparing the memory address p, not the contents of the memory address.

By way of analogy, if I do

   mkdir -p /tmp/1/a
   ln -s /tmp/1 /tmp/2
   rm -rf /tmp/1

then /tmp/2 is still a symlink.  'stat /tmp/2' still works and is well
defined.  And if I create a new /tmp/1, then /tmp/2 starts pointing to
that.  Yes, re-useing p like that is a very bad idea, in many cases :)

> And how can we justify a hypothetical false (which compilers don't
> implement), if compilers will really just read the value?  To implement this
> as well defined behavior, it could result in no other than false, and it
> would require heavy overhead for the compilers to detect that the
> seemingly-equal values are indeed different, don't you think?  The easiest
> solution is for the standard to just declare this outlaw, IMO.
> 
> Maybe it could do an exception for printing, that is, reading a pointer is
> not a problem in itself, a long as you don't compare it, but I'm not such an
> expert about this.
> 
> Cheers,
> 
> Alex
> 
> > 
> > -serge
> 
> -- 
> <http://www.alejandro-colomar.es/>
> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
> 





Re: Missed warning (-Wuse-after-free)

2023-02-24 Thread Serge E. Hallyn
On Fri, Feb 24, 2023 at 09:36:45AM +0100, Martin Uecker wrote:
> Am Donnerstag, dem 23.02.2023 um 19:21 -0600 schrieb Serge E. Hallyn:
> > On Fri, Feb 24, 2023 at 01:02:54AM +0100, Alex Colomar wrote:
> > > Hi Martin,
> > > 
> > > On 2/23/23 20:57, Martin Uecker wrote:
> > > > Am Donnerstag, dem 23.02.2023 um 20:23 +0100 schrieb Alex Colomar:
> > > > > Hi Martin,
> > > > > 
> > > > > On 2/17/23 14:48, Martin Uecker wrote:
> > > > > > > This new wording doesn't even allow one to use memcmp(3);
> > > > > > > just reading the pointer value, however you do it, is UB.
> > > > > > 
> > > > > > memcmp would not use the pointer value but work
> > > > > > on the representation bytes and is still allowed.
> > > > > 
> > > > > Hmm, interesting.  It's rather unspecified behavior. Still
> > > > > unpredictable: (memcmp(&p, &p, sizeof(p) == 0) might evaluate to true 
> > > > > or
> > > > > false randomly; the compiler may compile out the call to memcmp(3),
> > > > > since it knows it won't produce any observable behavior.
> > > > > 
> > > > > <https://software.codidact.com/posts/287905>
> > > > 
> > > > No, I think several things get mixed up here.
> > > > 
> > > > The representation of a pointer that becomes invalid
> > > > does not change.
> > > > 
> > > > So (0 === memcmp(&p, &p, sizeof(p)) always
> > > > evaluates to true.
> > > > 
> > > > Also in general, an unspecified value is simply unspecified
> > > > but does not change anymore.
> > 
> > Right.  p is its own thing - n bytes on the stack containing some value.
> > Once it comes into scope, it doesn't change on its own.  And if I do
> > free(p) or o = realloc(p), then the value of p itself - the n bytes on
> > the stack - does not change.
> 
> Yes, but one comment about terminology:. The C standard
> differentiates between the representation, i.e. the bytes on
> the stack, and the value.  The representation is converted to
> a value during lvalue conversion.  For an invalid pointer
> the representation is indeterminate because it now does not
> point to a valid object anymore.  So it is not possible to
> convert the representation to a value during lvalue conversion.
> In other words, it does not make sense to speak of the value
> of the pointer anymore.

I'm sure there are, especially from an implementer's point of view,
great reasons for this.

However, as just a user, the "value" of 'void *p' should absolutely
not be tied to whatever is at that address.  I'm given a simple
linear memory space, under which sits an entirely different view
obfuscated by page tables, but that doesn't concern me.  if I say
void *p = -1, then if I print p, then I expect to see that value.

Since I'm complaining about standards I'm picking and choosing here,
but I'll still point at the printf(3) manpage :)  :

   p  The  void * pointer argument is printed in hexadecimal (as if by 
%#x
  or %#lx).

> > I realize C11 appears to have changed that.  I fear that in doing so it
> > actually risks increasing the confusion about pointers.  IMO it's much
> > easier to reason about
> > 
> > o = realloc(p, X);
> > 
> > (and more baroque constructions) when keeping in mind that o, p, and the
> > object pointed to by either one are all different things.
> > 
> 
> What did change in C11? As far as I know, the pointer model
> did not change in C11.

I haven't looked in more detail, and don't really plan to, but my
understanding is that the text of:

  The lifetime of an object is the portion of program execution during which 
storage is
  guaranteed to be reserved for it. An object exists, has a constant address, 
and retains
  its last-stored value throughout its lifetime. If an object is referred to 
outside of its
  lifetime, the behavior is undefined. The value of a pointer becomes 
indeterminate when
  the object it points to (or just past) reaches the end of its lifetime.

(especially the last sentence) was new.

Maybe the words "value of a pointer" don't mean what I think they
mean.  But that's the phrase to which I object.  The n bytes on
the stack, p, are not changed just because something happened with
the accounting for the memory at the address represented by that
value.  If they do, then that's not 'C' any more.

> > > > Reading an uninitialized valu

Re: Missed warning (-Wuse-after-free)

2023-02-23 Thread Serge E. Hallyn
On Fri, Feb 24, 2023 at 01:02:54AM +0100, Alex Colomar wrote:
> Hi Martin,
> 
> On 2/23/23 20:57, Martin Uecker wrote:
> > Am Donnerstag, dem 23.02.2023 um 20:23 +0100 schrieb Alex Colomar:
> > > Hi Martin,
> > > 
> > > On 2/17/23 14:48, Martin Uecker wrote:
> > > > > This new wording doesn't even allow one to use memcmp(3);
> > > > > just reading the pointer value, however you do it, is UB.
> > > > 
> > > > memcmp would not use the pointer value but work
> > > > on the representation bytes and is still allowed.
> > > 
> > > Hmm, interesting.  It's rather unspecified behavior. Still
> > > unpredictable: (memcmp(&p, &p, sizeof(p) == 0) might evaluate to true or
> > > false randomly; the compiler may compile out the call to memcmp(3),
> > > since it knows it won't produce any observable behavior.
> > > 
> > > 
> > 
> > No, I think several things get mixed up here.
> > 
> > The representation of a pointer that becomes invalid
> > does not change.
> > 
> > So (0 === memcmp(&p, &p, sizeof(p)) always
> > evaluates to true.
> > 
> > Also in general, an unspecified value is simply unspecified
> > but does not change anymore.

Right.  p is its own thing - n bytes on the stack containing some value.
Once it comes into scope, it doesn't change on its own.  And if I do
free(p) or o = realloc(p), then the value of p itself - the n bytes on
the stack - does not change.

I realize C11 appears to have changed that.  I fear that in doing so it
actually risks increasing the confusion about pointers.  IMO it's much
easier to reason about

o = realloc(p, X);

(and more baroque constructions) when keeping in mind that o, p, and the
object pointed to by either one are all different things.

> > Reading an uninitialized value of automatic storage whose
> > address was not taken is undefined behavior, so everything
> > is possible afterwards.
> > 
> > An uninitialized variable whose address was taken has a
> > representation which can represent an unspecified value
> > or a no-value (trap) representation. Reading the
> > representation itself is always ok and gives consistent
> > results. Reading the variable can be undefined behavior
> > iff it is a trap representation, otherwise you get
> > the unspecified value which is stored there.
> > 
> > At least this is my reading of the C standard. Compilers
> > are not full conformant.
> 
> Does all this imply that the following is well defined behavior (and shall
> print what one would expect)?
> 
>   free(p);
> 
>   (void) &p;  // take the address
>   // or maybe we should (void) memcmp(&p, &p, sizeof(p)); ?
> 
>   printf("%p\n", p);  // we took previously its address,
>   // so now it has to hold consistently
>   // the previous value
> 
> 
> This feels weird.  And a bit of a Schroedinger's pointer.  I'm not entirely
> convinced, but might be.

Again, p is just an n byte variable which happens to have (one hopes)
pointed at a previously malloc'd address.

And I'd argue that pre-C11, this was not confusing, and would not have
felt weird to you.

But I am most grateful to you for having brought this to my attention.
I may not agree with it and not like it, but it's right there in the
spec, so time for me to adjust :)

-serge