Re: WG14 paper for removing restrict from nptr in strtol(3)

Paul Eggert Tue, 09 Jul 2024 05:09:51 -0700

On 7/8/24 00:52, Alejandro Colomar wrote:
> a small set of functions
> accept pointers that alias each other, but one of them is never
> accessed; in those few cases, restrict was added to the parameters in
> ISO C, but I claim it would be better removed.

Are these aliasing pointers the nptr and initial *endptr of strtol? Thatis, are you saying that last line in the following example, which iscurrently invalid, should become valid and should be implementable as‘end = s; long l = 0;’?


   char *end;
   char *s = (char *) &end;
   *s = '\0';
   long l = strtol (s, &end, 0);

If so, I fail to see the motivation for the proposed change, as nobodywrites (or should write) code like that. And if not, evidently Imisunderstand the proposal.

> the small set of functions where this happens don't seem to use anystate,

> so we don't need to care about implementations using internal buffers
> that are passed somehow to the user.

For strtol (nptr, endptr, 10) evidently the state that’s of concern isthe value of *endptr. But here it’s possible that an implementationcould use that state, even if the source code of the implementation'sstrtol does not. For example, suppose the key part of the base 10implementation in strtol.c is this:


    bool overflow = false;
    long int n = 0;
    for (; '0' <= *nptr && *nptr <= '9'; nptr++)
      {
        overflow |= ckd_mul (&n, n, 10);
        overflow |= ckd_add (&n, n, *nptr - '0');
      }
    *endptr = (char *) nptr;
    ... more code goes here ...

Currently, on typical platforms where CHAR_WIDTH < INT_WIDTH andINT_WIDTH == UINT_WIDTH, the C standard lets the compiler to compilethis code as if it were the following instead.


    bool overflow = false;
    long int n = 0;
    *endptr = (char *) nptr;
    unsigned int digit = *nptr++ - '0';
    if (digit <= 9)
      {
        n = digit;
        while ((digit = *nptr++ - '0') <= 9)
          {
            overflow |= ckd_mul (&n, n, 10);
            overflow |= ckd_add (&n, n, digit);
          }
        *endptr = (char *) nptr;
      }
    ... more code goes here ...

This sort of thing might make sense on some architectures. However, theproposed change would not allow this optimization, because it’s invalidwhen nptr points into *endptr.

For strtol I suppose this is not that big a deal; strtol is kinda slowanyway so who cares if it’s a bit slower? But surely we wouldn’t want togive up even this minor performance win unless we get something inreturn, and I’m still not seeing what we get in return.



> Maybe I should use abstract names for the objects, to avoid confusing
> them with the pointer variables that are used to pass them?

That might help, yes, since v0.2 is unclear on this point.


> this formal
> definition is quite unreadable, though.  The more I read it, the less
> sure I am about it.

Yes, it’s lovely isn’t it? One must understand what the C committee
intended in order to read and understand that part of the standard.


>    If L is used to access the value of the object X that it
>    designates, and X is also modified (by any means), then the
>    following requirements apply: T shall not be const-qualified
>
> This reads to me as "const variables are not writable when they are
> accessed via a restricted pointer; casting away is not enough".  Am I
> reading this correctly?

In that quoted statement, the restricted pointer is not allowed to bepointer-to-const. However, I’m not quite sure what your question means,as the phrase “const variables” does not appear in the standard. Perhapsgive an example to clarify the question?

>> an implementation is allowed to set errno = EINVAL first thing, andthen set>> errno to some other nonzero value if it determines that thearguments are

>> valid. I wouldn't implement strtol that way, but I can see where someone
>> else might do that.
>
> In any case an implementation is not obliged to pessimize strtol(3).  It
> is only allowed to.  Should we not allow them to do so?

Of course the standard should allow suboptimal implementations. However,I’m not sure what the point of the question is. The “errno = EINVALfirst thing” comment says that removing ‘restrict’ obliges theimplementation to support obviously-bogus calls like strtol(&errno,...), which might make the implementation less efficient. I don’t seehow the question is relevant to that comment.



> Let's take a simpler one: rename(2).  Is it allowed to receive &errno?
> Hopefully not.

I agree with that hope, but the current C standard seems to allow it. Ithink we both agree this is a defect in the standard.



>>>> Why is this change worth
>>>> making? Real-world programs do not make calls like that.
>>>
>>> Because it makes analysis of 'restrict' more consistent.  The obvious
>>> improvement of GCC's analyzer to catch restrict violations will trigger
>>> false positives in normal uses of strtol(3).
>>

>> v0.2 does not support this line of reasoning. On the contrary, v0.2suggests>> that a compiler should diagnose calls like "strtol(p, &p, 0)", whichwould

>> be wrong as that call is perfectly reasonable.
>
> That call is perfectly, reasonable, which is why I suggest that the
> standard should modify the prototype so that strtol(p, &p, 0), which is
> a reasonable call, should not be warned by a compiler that would
> diagnose such calls.

Of course they shouldn’t warn. But where are these compilers?

v0.2 asserts that “An analyzer more powerful than the current ones couldextend the current -Wrestrict diagnostic to also diagnose this case.”But why would an analyzer want to do that? v0.2 doesn’t say.

The proposal merely asks to change prototypes for the C standardfunctions strtol, strtoul, etc. But if that is the only change neededthen why bother? C compilers already do special-case analysis forfunctions defined by the C standard, and they can suppress undesirablediagnostics for these special cases.

If you’ve identified a more general problem with ‘restrict’ then welcometo the club! The experts already know it’s confusing and limited, andare discussing about whether and how to improve things in the next Cstandard. I am sure you’d be welcome to those discussions.



> That is, just by reading the prototypes:
>
>    void foo(int *restrict x, int **restrict p);
>
> and
>
>    void bar(int *x, int **restrict endp);
>
> one should be able to determine that
>
>    foo(p, &p);
>
> is probably causing UB (and thus trigger a warning) but
>
>    bar(p, &p);
>
> is fine.

Sure, but this is a discussion we should be having with the compilerwriters, no?

Is this the main motivation for the proposal? If so, how would weakeningthe spec for strtol etc. affect that discussion with the compilerwriters? v0.2 does not make this clear.



>> Another way to put it: v0.2 does not clearly state the advantages of the
>> proposed change, and in at least one area what it states as an advantage
>> would actually be a disadvantage.
>
> The advantage is having more information in the caller.  As a caller, I
> want to distinguish calls where it's ok to pass pointers that alias, and
> where not.  And I want my compiler to be able to help me there.

I’m still not understanding. Removing ‘restrict’ from strtol’s first arggives the caller less information, not more.



> I'd rather have a simple analyzer, which will provide for
> less false positives and negatives.

The C committee appears to have the opposite opinion, as when they wereasked about this matter they added Examples 5 through 7 to what is now§6.7.4.2 (Formal definition of restrict). These examples say thatExample 2 (which uses ‘restrict’ on all arguments) is the simplest andmost effective way to use ‘restrict’, even though a smarter compiler canstill make some good inferences when some pointer args are ‘restrict’and others are merely pointers to const.

If the proposal is disagreeing with Examples 5 through 7, this pointneeds to be thoroughly discussed in the proposal.



> GCC can only catch the most obvious violations of restrict.

Yes, but I fail to see how changing the API for strtol etc. wouldimprove that situation.



>    #include <string.h>
>
>    typedef struct {
>            int x;
>    } T;
>
>    [[gnu::access(read_only, 1)]]
>    [[gnu::access(read_only, 2)]]
>    void
>    replace(T *restrict *restrict ls, const T *restrict new, size_t pos)
>    {
>            memcpy(ls[pos], new, sizeof(T));
>    }
>
>    void
>    f(T *restrict *restrict ls)
>    {
>            replace(ls, ls[0], 1);
>    }
>
>    $ gcc-14 -Wall -Wextra -fanalyzer replace.c -S
>    $
>
> The above program causes UB,

It’s not a complete program and I don’t see the undefined behavior. Ifbehavior is undefined because it violates the [[gnu::access(...)]]restrictions, that is not the sort of example that would convince the Cstandardization committee; they’d want to see a standard C program.

I tried to write a standard C program to illustrate the issue, and cameup with the following.


  #include <string.h>

  typedef struct { int x; } T;

  static void
  replace (T *restrict const *restrict ls,
           T const *restrict new, size_t pos)
  {
    memcpy (ls[pos], new, sizeof (T));
  }

  static void
  f (T *restrict const *restrict ls)
  {
    replace (ls, ls[0], 1);
  }

  int
  main ()
  {
    T u = {100}, v = {200}, *a[2] = {&u, &v};
    f (a);
    return a[0]->x ^ a[1]->x;
  }

However, I still don’t see undefined behavior there. gcc -O2 compilesthis as if it were ‘int main () { return 0; }’ which is the onlypossible correct behavior.



> writing an analyzer that triggers on this code, it
> will trigger on strtol(3) too.

Sorry, I’m still not following the motivation for the proposed change.It appears to be something like “if we removed ‘restrict’ from strtoletc’s first arg, compilers could generate better ‘restrict’ diagnosticseverywhere” but none if this is clear or making sense to me. And if I’mmissing the point I have little doubt that the C committee will miss it too.

Re: WG14 paper for removing restrict from nptr in strtol(3)

Reply via email to