Re: WG14 paper for removing restrict from nptr in strtol(3)

Alejandro Colomar via Gcc Tue, 09 Jul 2024 10:37:15 -0700

Hi Paul,

On Tue, Jul 09, 2024 at 02:09:24PM GMT, Paul Eggert wrote:
> On 7/8/24 00:52, Alejandro Colomar wrote:
> > a small set of functions
> > accept pointers that alias each other, but one of them is never
> > accessed; in those few cases, restrict was added to the parameters in
> > ISO C, but I claim it would be better removed.
> 
> Are these aliasing pointers the nptr and initial *endptr of strtol?


Yes.

> That is,
> are you saying that last line in the following example, which is currently
> invalid, should become valid and should be implementable as ‘end = s; long l
> = 0;’?

No.  I don't think this is a consequence of the previous statement.

> 
>    char *end;
>    char *s = (char *) &end;
>    *s = '\0';
>    long l = strtol (s, &end, 0);
> 
> If so, I fail to see the motivation for the proposed change, as nobody
> writes (or should write) code like that. And if not, evidently I
> misunderstand the proposal.

My proposal is:

         long int
        -strtol(const char *restrict nptr, char **restrict endptr, int base);
        +strtol(const char *nptr, char **restrict endptr, int base);

My proposal doesn't make valid the example above.  To make that example
valid, you'd need:

        long int
        strtol(const char *nptr, char **endptr, int base);

Because in the example above, you're aliasing nptr with endptr, not with
*endptr.  Thus, endptr cannot be a restricted pointer for that example
to be valid.

[... snip ...]

I'm not sure I understood that part, but it's probably a consequence of
the misuderstanding from above.  Let's ignore it for now, and please
resend if you think it's still a concern.

> 
> > Maybe I should use abstract names for the objects, to avoid confusing
> > them with the pointer variables that are used to pass them?
> 
> That might help, yes, since v0.2 is unclear on this point.

Ok; will do.

> > this formal
> > definition is quite unreadable, though.  The more I read it, the less
> > sure I am about it.
> 
> Yes, it’s lovely isn’t it? One must understand what the C committee
> intended in order to read and understand that part of the standard.

:-)

> >     If L is used to access the value of the object X that it
> >     designates, and X is also modified (by any means), then the
> >     following requirements apply: T shall not be const-qualified
> >
> > This reads to me as "const variables are not writable when they are
> > accessed via a restricted pointer; casting away is not enough".  Am I
> > reading this correctly?
> 
> In that quoted statement, the restricted pointer is not allowed to be
> pointer-to-const. However, I’m not quite sure what your question means, as
> the phrase “const variables” does not appear in the standard. Perhaps give
> an example to clarify the question?

I should have said

"An object pointed to by a pointer-to-const cannot be written if the
pointer is a restricted one; casting const away is not enough."

Is this interpretation of restrict correct?

> >> an implementation is allowed to set errno = EINVAL first thing, and then
> set
> >> errno to some other nonzero value if it determines that the arguments are
> >> valid. I wouldn't implement strtol that way, but I can see where someone
> >> else might do that.
> >
> > In any case an implementation is not obliged to pessimize strtol(3).  It
> > is only allowed to.  Should we not allow them to do so?
> 
> Of course the standard should allow suboptimal implementations. However, I’m
> not sure what the point of the question is. The “errno = EINVAL first thing”
> comment says that removing ‘restrict’ obliges the implementation to support
> obviously-bogus calls like strtol(&errno, ...), which might make the
> implementation less efficient.

See for example how musl implements strtol(3):

$ grepc strtox src/stdlib/strtol.c
src/stdlib/strtol.c:static unsigned long long strtox(const char *s, char **p, 
int base, unsigned long long lim)
{
        FILE f;
        sh_fromstring(&f, s);
        shlim(&f, 0);
        unsigned long long y = __intscan(&f, base, 1, lim);
        if (p) {
                size_t cnt = shcnt(&f);
                *p = (char *)s + cnt;
        }
        return y;
}

The work is done within __intscan(), which could be prototyped as

        hidden unsigned long long
        __intscan(FILE *restrict, unsigned, int, unsigned long long);

And now you're able to optimize internally, since thanks to that helper
function you know it doesn't alias errno, regardless of the external
API.


BTW, now I remember that strtol(3) says:

ERRORS
     This function does not modify errno on success.

Which means that setting errno at function start wouldn't make much
sense.  Although there's probably a contrived way of doing it and still
be conformant (plus, I think ISO C doesn't say that about errno).

> I don’t see how the question is relevant to
> that comment.
> 
> 
> > Let's take a simpler one: rename(2).  Is it allowed to receive &errno?
> > Hopefully not.
> 
> I agree with that hope, but the current C standard seems to allow it. I
> think we both agree this is a defect in the standard.

Yup.  :)

> >>>> Why is this change worth
> >>>> making? Real-world programs do not make calls like that.
> >>>
> >>> Because it makes analysis of 'restrict' more consistent.  The obvious
> >>> improvement of GCC's analyzer to catch restrict violations will trigger
> >>> false positives in normal uses of strtol(3).
> >>
> >> v0.2 does not support this line of reasoning. On the contrary, v0.2
> suggests
> >> that a compiler should diagnose calls like "strtol(p, &p, 0)", which
> would
> >> be wrong as that call is perfectly reasonable.
> >
> > That call is perfectly, reasonable, which is why I suggest that the
> > standard should modify the prototype so that strtol(p, &p, 0), which is
> > a reasonable call, should not be warned by a compiler that would
> > diagnose such calls.
> 
> Of course they shouldn’t warn. But where are these compilers?
> 
> v0.2 asserts that “An analyzer more powerful than the current ones could
> extend the current -Wrestrict diagnostic to also diagnose this case.” But
> why would an analyzer want to do that? v0.2 doesn’t say.

True.

> The proposal merely asks to change prototypes for the C standard functions
> strtol, strtoul, etc. But if that is the only change needed then why bother?
> C compilers already do special-case analysis for functions defined by the C
> standard, and they can suppress undesirable diagnostics for these special
> cases.
> 
> If you’ve identified a more general problem with ‘restrict’ then welcome to
> the club! The experts already know it’s confusing and limited, and are
> discussing about whether and how to improve things in the next C standard. I
> am sure you’d be welcome to those discussions.

Thanks!  I'm thinking I'll drop my proposal and redirection it into
replacing restrict by something better.

> > That is, just by reading the prototypes:
> >
> >     void foo(int *restrict x, int **restrict p);
> >
> > and
> >
> >     void bar(int *x, int **restrict endp);
> >
> > one should be able to determine that
> >
> >     foo(p, &p);
> >
> > is probably causing UB (and thus trigger a warning) but
> >
> >     bar(p, &p);
> >
> > is fine.
> 
> Sure, but this is a discussion we should be having with the compiler
> writers, no?
> 
> Is this the main motivation for the proposal?

Yep.

> If so, how would weakening the
> spec for strtol etc. affect that discussion with the compiler writers? v0.2
> does not make this clear.
> 
> 
> >> Another way to put it: v0.2 does not clearly state the advantages of the
> >> proposed change, and in at least one area what it states as an advantage
> >> would actually be a disadvantage.
> >
> > The advantage is having more information in the caller.  As a caller, I
> > want to distinguish calls where it's ok to pass pointers that alias, and
> > where not.  And I want my compiler to be able to help me there.
> 
> I’m still not understanding. Removing ‘restrict’ from strtol’s first arg
> gives the caller less information, not more.

Actually, the caller seems to have perfect information about strtol(3),
regardless of restrict.  (As long as strtol(3) uses gnu access attributes.)

However, in this paragraph, I meant not about strtol(3), but in general:
If a caller know if two arguments to a function are allowed to alias
just by seeing the uses of restrict in the prototype, it is allowed to
turn on strict diagnostics about it to catch UB.

> > I'd rather have a simple analyzer, which will provide for
> > less false positives and negatives.
> 
> The C committee appears to have the opposite opinion, as when they were
> asked about this matter they added Examples 5 through 7 to what is now
> §6.7.4.2 (Formal definition of restrict). These examples say that Example 2
> (which uses ‘restrict’ on all arguments) is the simplest and most effective
> way to use ‘restrict’, even though a smarter compiler can still make some
> good inferences when some pointer args are ‘restrict’ and others are merely
> pointers to const.
> 
> If the proposal is disagreeing with Examples 5 through 7, this point needs
> to be thoroughly discussed in the proposal.

My proposal is thinking now that restrict is a dead end, and must be
replaced by something better.

> > GCC can only catch the most obvious violations of restrict.
> 
> Yes, but I fail to see how changing the API for strtol etc. would improve
> that situation.
> 
> 
> >     #include <string.h>
> >
> >     typedef struct {
> >             int x;
> >     } T;
> >
> >     [[gnu::access(read_only, 1)]]
> >     [[gnu::access(read_only, 2)]]
> >     void
> >     replace(T *restrict *restrict ls, const T *restrict new, size_t pos)
> >     {
> >             memcpy(ls[pos], new, sizeof(T));
> >     }
> >
> >     void
> >     f(T *restrict *restrict ls)
> >     {
> >             replace(ls, ls[0], 1);
> >     }
> >
> >     $ gcc-14 -Wall -Wextra -fanalyzer replace.c -S
> >     $
> >
> > The above program causes UB,
> 
> It’s not a complete program and I don’t see the undefined behavior.

I should have said s/program/code/

> If
> behavior is undefined because it violates the [[gnu::access(...)]]
> restrictions,

It does not violate the gnu::access restrictions.  It actually only
reads the objects pointed to by ls and new.  It is the object pointed to
by *ls the one which is written to, but that's fine.

When I wrote it, I was thinking that the behavior was undefined because
the object pointed to by *ls is aliased by the object pointed to by new.
However, it is not UB; I forgot that restrict doesn't care if the
pointer aliases; it only cares if an access does alias, which does not
happen.  Let's s/0/1/ in that code to make it UB.

If you s/0/1/ in my code, it is UB.  I'd like a substitute for restrict
to reject that code because both new and ls are derived from the same
pointer in the caller.  That is, I'd like passing two references to the
same object is UB, via some attribute; regardless of accesses.  More or
less what Rust does, but opt-in in a controlled way.

> that is not the sort of example that would convince the C
> standardization committee; they’d want to see a standard C program.
> 
> I tried to write a standard C program to illustrate the issue, and came up
> with the following.

[...]

Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>

signature.asc
Description: PGP signature

Re: WG14 paper for removing restrict from nptr in strtol(3)

Reply via email to