On 7/8/24 00:52, Alejandro Colomar wrote:
> a small set of functions
> accept pointers that alias each other, but one of them is never
> accessed; in those few cases, restrict was added to the parameters in
> ISO C, but I claim it would be better removed.

Are these aliasing pointers the nptr and initial *endptr of strtol? That is, are you saying that last line in the following example, which is currently invalid, should become valid and should be implementable as ‘end = s; long l = 0;’?

   char *end;
   char *s = (char *) &end;
   *s = '\0';
   long l = strtol (s, &end, 0);

If so, I fail to see the motivation for the proposed change, as nobody writes (or should write) code like that. And if not, evidently I misunderstand the proposal.


> the small set of functions where this happens don't seem to use any state,
> so we don't need to care about implementations using internal buffers
> that are passed somehow to the user.

For strtol (nptr, endptr, 10) evidently the state that’s of concern is the value of *endptr. But here it’s possible that an implementation could use that state, even if the source code of the implementation's strtol does not. For example, suppose the key part of the base 10 implementation in strtol.c is this:

    bool overflow = false;
    long int n = 0;
    for (; '0' <= *nptr && *nptr <= '9'; nptr++)
      {
        overflow |= ckd_mul (&n, n, 10);
        overflow |= ckd_add (&n, n, *nptr - '0');
      }
    *endptr = (char *) nptr;
    ... more code goes here ...

Currently, on typical platforms where CHAR_WIDTH < INT_WIDTH and INT_WIDTH == UINT_WIDTH, the C standard lets the compiler to compile this code as if it were the following instead.

    bool overflow = false;
    long int n = 0;
    *endptr = (char *) nptr;
    unsigned int digit = *nptr++ - '0';
    if (digit <= 9)
      {
        n = digit;
        while ((digit = *nptr++ - '0') <= 9)
          {
            overflow |= ckd_mul (&n, n, 10);
            overflow |= ckd_add (&n, n, digit);
          }
        *endptr = (char *) nptr;
      }
    ... more code goes here ...

This sort of thing might make sense on some architectures. However, the proposed change would not allow this optimization, because it’s invalid when nptr points into *endptr.

For strtol I suppose this is not that big a deal; strtol is kinda slow anyway so who cares if it’s a bit slower? But surely we wouldn’t want to give up even this minor performance win unless we get something in return, and I’m still not seeing what we get in return.


> Maybe I should use abstract names for the objects, to avoid confusing
> them with the pointer variables that are used to pass them?

That might help, yes, since v0.2 is unclear on this point.


> this formal
> definition is quite unreadable, though.  The more I read it, the less
> sure I am about it.

Yes, it’s lovely isn’t it? One must understand what the C committee
intended in order to read and understand that part of the standard.


>    If L is used to access the value of the object X that it
>    designates, and X is also modified (by any means), then the
>    following requirements apply: T shall not be const-qualified
>
> This reads to me as "const variables are not writable when they are
> accessed via a restricted pointer; casting away is not enough".  Am I
> reading this correctly?

In that quoted statement, the restricted pointer is not allowed to be pointer-to-const. However, I’m not quite sure what your question means, as the phrase “const variables” does not appear in the standard. Perhaps give an example to clarify the question?


>> an implementation is allowed to set errno = EINVAL first thing, and then set >> errno to some other nonzero value if it determines that the arguments are
>> valid. I wouldn't implement strtol that way, but I can see where someone
>> else might do that.
>
> In any case an implementation is not obliged to pessimize strtol(3).  It
> is only allowed to.  Should we not allow them to do so?

Of course the standard should allow suboptimal implementations. However, I’m not sure what the point of the question is. The “errno = EINVAL first thing” comment says that removing ‘restrict’ obliges the implementation to support obviously-bogus calls like strtol(&errno, ...), which might make the implementation less efficient. I don’t see how the question is relevant to that comment.


> Let's take a simpler one: rename(2).  Is it allowed to receive &errno?
> Hopefully not.

I agree with that hope, but the current C standard seems to allow it. I think we both agree this is a defect in the standard.


>>>> Why is this change worth
>>>> making? Real-world programs do not make calls like that.
>>>
>>> Because it makes analysis of 'restrict' more consistent.  The obvious
>>> improvement of GCC's analyzer to catch restrict violations will trigger
>>> false positives in normal uses of strtol(3).
>>
>> v0.2 does not support this line of reasoning. On the contrary, v0.2 suggests >> that a compiler should diagnose calls like "strtol(p, &p, 0)", which would
>> be wrong as that call is perfectly reasonable.
>
> That call is perfectly, reasonable, which is why I suggest that the
> standard should modify the prototype so that strtol(p, &p, 0), which is
> a reasonable call, should not be warned by a compiler that would
> diagnose such calls.

Of course they shouldn’t warn. But where are these compilers?

v0.2 asserts that “An analyzer more powerful than the current ones could extend the current -Wrestrict diagnostic to also diagnose this case.” But why would an analyzer want to do that? v0.2 doesn’t say.

The proposal merely asks to change prototypes for the C standard functions strtol, strtoul, etc. But if that is the only change needed then why bother? C compilers already do special-case analysis for functions defined by the C standard, and they can suppress undesirable diagnostics for these special cases.

If you’ve identified a more general problem with ‘restrict’ then welcome to the club! The experts already know it’s confusing and limited, and are discussing about whether and how to improve things in the next C standard. I am sure you’d be welcome to those discussions.


> That is, just by reading the prototypes:
>
>    void foo(int *restrict x, int **restrict p);
>
> and
>
>    void bar(int *x, int **restrict endp);
>
> one should be able to determine that
>
>    foo(p, &p);
>
> is probably causing UB (and thus trigger a warning) but
>
>    bar(p, &p);
>
> is fine.

Sure, but this is a discussion we should be having with the compiler writers, no?

Is this the main motivation for the proposal? If so, how would weakening the spec for strtol etc. affect that discussion with the compiler writers? v0.2 does not make this clear.


>> Another way to put it: v0.2 does not clearly state the advantages of the
>> proposed change, and in at least one area what it states as an advantage
>> would actually be a disadvantage.
>
> The advantage is having more information in the caller.  As a caller, I
> want to distinguish calls where it's ok to pass pointers that alias, and
> where not.  And I want my compiler to be able to help me there.

I’m still not understanding. Removing ‘restrict’ from strtol’s first arg gives the caller less information, not more.


> I'd rather have a simple analyzer, which will provide for
> less false positives and negatives.

The C committee appears to have the opposite opinion, as when they were asked about this matter they added Examples 5 through 7 to what is now §6.7.4.2 (Formal definition of restrict). These examples say that Example 2 (which uses ‘restrict’ on all arguments) is the simplest and most effective way to use ‘restrict’, even though a smarter compiler can still make some good inferences when some pointer args are ‘restrict’ and others are merely pointers to const.

If the proposal is disagreeing with Examples 5 through 7, this point needs to be thoroughly discussed in the proposal.


> GCC can only catch the most obvious violations of restrict.

Yes, but I fail to see how changing the API for strtol etc. would improve that situation.


>    #include <string.h>
>
>    typedef struct {
>            int x;
>    } T;
>
>    [[gnu::access(read_only, 1)]]
>    [[gnu::access(read_only, 2)]]
>    void
>    replace(T *restrict *restrict ls, const T *restrict new, size_t pos)
>    {
>            memcpy(ls[pos], new, sizeof(T));
>    }
>
>    void
>    f(T *restrict *restrict ls)
>    {
>            replace(ls, ls[0], 1);
>    }
>
>    $ gcc-14 -Wall -Wextra -fanalyzer replace.c -S
>    $
>
> The above program causes UB,

It’s not a complete program and I don’t see the undefined behavior. If behavior is undefined because it violates the [[gnu::access(...)]] restrictions, that is not the sort of example that would convince the C standardization committee; they’d want to see a standard C program.

I tried to write a standard C program to illustrate the issue, and came up with the following.

  #include <string.h>

  typedef struct { int x; } T;

  static void
  replace (T *restrict const *restrict ls,
           T const *restrict new, size_t pos)
  {
    memcpy (ls[pos], new, sizeof (T));
  }

  static void
  f (T *restrict const *restrict ls)
  {
    replace (ls, ls[0], 1);
  }

  int
  main ()
  {
    T u = {100}, v = {200}, *a[2] = {&u, &v};
    f (a);
    return a[0]->x ^ a[1]->x;
  }

However, I still don’t see undefined behavior there. gcc -O2 compiles this as if it were ‘int main () { return 0; }’ which is the only possible correct behavior.


> writing an analyzer that triggers on this code, it
> will trigger on strtol(3) too.

Sorry, I’m still not following the motivation for the proposed change. It appears to be something like “if we removed ‘restrict’ from strtol etc’s first arg, compilers could generate better ‘restrict’ diagnostics everywhere” but none if this is clear or making sense to me. And if I’m missing the point I have little doubt that the C committee will miss it too.

Reply via email to