Re: WG14 paper for removing restrict from nptr in strtol(3)

Paul Eggert Sun, 07 Jul 2024 10:31:34 -0700

On 7/7/24 14:42, Alejandro Colomar wrote:

On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote:

Also, “global variables” is not
right here. The C standard allows strtol, for example, to read and write an
internal static cache. (Yes, that would be weird, but it’s allowed.)


That's not part of the API.  A user must not access internal static
cache

Although true in the normal (sane) case, as an extension theimplementation can make such a static cache visible to the user, and inthis case the caller must not pass cache addresses as arguments to strtol.

For other functions this point is not purely academic. For example, theC standard specifies the signature "FILE *fopen(const char *restrict,const char *restrict);". If I understand your argument correctly, itsays that the "restrict"s can be omitted there without changing the setof valid programs. But that can't be right, as omitting the "restrict"swould make the following code be valid in any platform where sizeof(int)>1:


   char *p = (char *) &errno;
   p[0] = 'r';
   p[1] = 0;
   FILE *f = fopen (p, p);

even though the current standard says this code is invalid.

“endptr access(write_only) ... *endptr access(none)”

This is true for glibc, but it’s not necessarily true for all conforming
strtol implementations. If endptr is non-null, a conforming strtol
implementation can both read and write *endptr;


It can't, I think.  It's perfectly valid to pass an uninitialized
endptr, which means the callee must not read the original value.

Sure, but the callee can do something silly like "*endptr = p + 1;*endptr = *endptr - 1;". That is, it can read *endptr after writing it,without any undefined behavior. (And if the callee is written inassembly language it can read *endptr even before writing it - but Idigress.)

The point is that it is not correct to say that *endptr cannot be readfrom; it can. Similarly for **endptr.

Here, we need to consider two separate objects.  The object pointed-to
by *endptr _before_ the object pointed to by endptr is written to, and
the object pointed-to by *endptr _after_ the object pointed to by endptr
is written to.

Those are not the only possibilities. The C standard also permits strtolto set *endptr to some other pointer value, not pointing anywhere intothe string being scanned, so long as it sets *endptr correctly before itreturns.

“The caller knows that errno doesn’t alias any of the function arguments.”

Only because all args are declared with ‘restrict’. So if the proposal is
accepted, the caller doesn’t necessarily know that.


Not really.  The caller has created the string (or has received it via a
restricted pointer)

v0.2 doesn't state the assumption that the caller either created thestring or received it via a restricted pointer. If this assumption werestated clearly, that would address the objection here.

“The callee knows that *endptr is not accessed.”

This is true for glibc, but not necessarily true for every conforming strtol
implementation.


The original *endptr may be uninitialized, and so must not be accessed.

**endptr can be read once the callee sets *endptr. **endptr can even bewritten, if the callee temporarily sets *endptr to point to a writablebuffer; admittedly this would be weird but it's allowed.

“It might seem that it’s a problem that the callee doesn’t know if nptr can
alias errno or not. However, the callee will not write to the latter
directly until it knows it has failed,”

Again this is true for glibc, but not necessarily true for every conforming
strtol implementation.


An implementation is free to set errno = EDEADLK in the middle of it, as
long as it later removes that.  However, I don't see how it would make
any sense.

It could make sense in some cases. Here the spec is a bit tricky, but animplementation is allowed to set errno = EINVAL first thing, and thenset errno to some other nonzero value if it determines that thearguments are valid. I wouldn't implement strtol that way, but I can seewhere someone else might do that.

Let's find
an ISO C function that accepts a non-restrict string:

        int system(const char *string);

Does ISO C constrain implementations to support system((char *)&errno)?
I don't think so.  Maybe it does implicitly because of a defect in the
wording, but even then it's widely understood that it doesn't.

'system' is a special case since the C standard says 'system' can dopretty much anything it likes. That being said, I agree thatimplementations shouldn't need to support calls like atol((char *)&errno). Certainly the C standard's description of atol, which definesatol's behavior in terms of a call to strtol, means that atol's argumentin practice must follow the 'restrict' rules.

Perhaps we should report this sort of thing as a defect in the standard.It is odd, for example, that fopen's two arguments are both const char*restrict, but system's argument lacks the "restrict".

Why is this change worth
making? Real-world programs do not make calls like that.


Because it makes analysis of 'restrict' more consistent.  The obvious
improvement of GCC's analyzer to catch restrict violations will trigger
false positives in normal uses of strtol(3).

v0.2 does not support this line of reasoning. On the contrary, v0.2suggests that a compiler should diagnose calls like "strtol(p, &p, 0)",which would be wrong as that call is perfectly reasonable.

Another way to put it: v0.2 does not clearly state the advantages of theproposed change, and in at least one area what it states as an advantagewould actually be a disadvantage.

“m = strtol(p, &p, 0); An analyzer more powerful than the current ones
could extend the current -Wrestrict diagnostic to also diagnose this case.”

Why would an analyzer want to do that? This case is a perfectly normal thing
to do and it has well-defined behavior.


Because without an analyzer, restrict cannot emit many useful
diagnostics.  It's a qualifier that's all about data flow analysis, and
normal diagnostics aren't able to do that.

A qualifier that enables optimizations but doesn't enable diagnostics is
quite dangerous, and probably better not used.  If however, the analyzer
emits advanced diagnostics for misuses of it, then it's a good
qualifier.

Sorry, but I don't understand what you're trying to say here. Really, Ican't make heads or tails of it. As-is, 'restrict' can be useful bothfor optimization and for generating diagnostics, and GCC does both ofthese things right now even if you don't use -fanalyzer.

Perhaps adding an example or two would help explain your point. Butthey'd need to be better examples than what's in v0.2 because v0.2 isunclear about this quality-of-diagnostics issue, as it relates to strtol.

Re: WG14 paper for removing restrict from nptr in strtol(3)

Reply via email to