On 7/7/24 14:42, Alejandro Colomar wrote:
On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote:
Also, “global variables” is not
right here. The C standard allows strtol, for example, to read and write an
internal static cache. (Yes, that would be weird, but it’s allowed.)

That's not part of the API.  A user must not access internal static
cache

Although true in the normal (sane) case, as an extension the implementation can make such a static cache visible to the user, and in this case the caller must not pass cache addresses as arguments to strtol.

For other functions this point is not purely academic. For example, the C standard specifies the signature "FILE *fopen(const char *restrict, const char *restrict);". If I understand your argument correctly, it says that the "restrict"s can be omitted there without changing the set of valid programs. But that can't be right, as omitting the "restrict"s would make the following code be valid in any platform where sizeof(int)>1:

   char *p = (char *) &errno;
   p[0] = 'r';
   p[1] = 0;
   FILE *f = fopen (p, p);

even though the current standard says this code is invalid.


“endptr access(write_only) ... *endptr access(none)”

This is true for glibc, but it’s not necessarily true for all conforming
strtol implementations. If endptr is non-null, a conforming strtol
implementation can both read and write *endptr;

It can't, I think.  It's perfectly valid to pass an uninitialized
endptr, which means the callee must not read the original value.

Sure, but the callee can do something silly like "*endptr = p + 1; *endptr = *endptr - 1;". That is, it can read *endptr after writing it, without any undefined behavior. (And if the callee is written in assembly language it can read *endptr even before writing it - but I digress.)

The point is that it is not correct to say that *endptr cannot be read from; it can. Similarly for **endptr.


Here, we need to consider two separate objects.  The object pointed-to
by *endptr _before_ the object pointed to by endptr is written to, and
the object pointed-to by *endptr _after_ the object pointed to by endptr
is written to.

Those are not the only possibilities. The C standard also permits strtol to set *endptr to some other pointer value, not pointing anywhere into the string being scanned, so long as it sets *endptr correctly before it returns.


“The caller knows that errno doesn’t alias any of the function arguments.”

Only because all args are declared with ‘restrict’. So if the proposal is
accepted, the caller doesn’t necessarily know that.

Not really.  The caller has created the string (or has received it via a
restricted pointer)

v0.2 doesn't state the assumption that the caller either created the string or received it via a restricted pointer. If this assumption were stated clearly, that would address the objection here.


“The callee knows that *endptr is not accessed.”

This is true for glibc, but not necessarily true for every conforming strtol
implementation.

The original *endptr may be uninitialized, and so must not be accessed.

**endptr can be read once the callee sets *endptr. **endptr can even be written, if the callee temporarily sets *endptr to point to a writable buffer; admittedly this would be weird but it's allowed.


“It might seem that it’s a problem that the callee doesn’t know if nptr can
alias errno or not. However, the callee will not write to the latter
directly until it knows it has failed,”

Again this is true for glibc, but not necessarily true for every conforming
strtol implementation.

An implementation is free to set errno = EDEADLK in the middle of it, as
long as it later removes that.  However, I don't see how it would make
any sense.

It could make sense in some cases. Here the spec is a bit tricky, but an implementation is allowed to set errno = EINVAL first thing, and then set errno to some other nonzero value if it determines that the arguments are valid. I wouldn't implement strtol that way, but I can see where someone else might do that.


Let's find
an ISO C function that accepts a non-restrict string:

        int system(const char *string);

Does ISO C constrain implementations to support system((char *)&errno)?
I don't think so.  Maybe it does implicitly because of a defect in the
wording, but even then it's widely understood that it doesn't.

'system' is a special case since the C standard says 'system' can do pretty much anything it likes. That being said, I agree that implementations shouldn't need to support calls like atol((char *) &errno). Certainly the C standard's description of atol, which defines atol's behavior in terms of a call to strtol, means that atol's argument in practice must follow the 'restrict' rules.

Perhaps we should report this sort of thing as a defect in the standard. It is odd, for example, that fopen's two arguments are both const char *restrict, but system's argument lacks the "restrict".


Why is this change worth
making? Real-world programs do not make calls like that.

Because it makes analysis of 'restrict' more consistent.  The obvious
improvement of GCC's analyzer to catch restrict violations will trigger
false positives in normal uses of strtol(3).

v0.2 does not support this line of reasoning. On the contrary, v0.2 suggests that a compiler should diagnose calls like "strtol(p, &p, 0)", which would be wrong as that call is perfectly reasonable.

Another way to put it: v0.2 does not clearly state the advantages of the proposed change, and in at least one area what it states as an advantage would actually be a disadvantage.


“m = strtol(p, &p, 0); An analyzer more powerful than the current ones
could extend the current -Wrestrict diagnostic to also diagnose this case.”

Why would an analyzer want to do that? This case is a perfectly normal thing
to do and it has well-defined behavior.

Because without an analyzer, restrict cannot emit many useful
diagnostics.  It's a qualifier that's all about data flow analysis, and
normal diagnostics aren't able to do that.

A qualifier that enables optimizations but doesn't enable diagnostics is
quite dangerous, and probably better not used.  If however, the analyzer
emits advanced diagnostics for misuses of it, then it's a good
qualifier.

Sorry, but I don't understand what you're trying to say here. Really, I can't make heads or tails of it. As-is, 'restrict' can be useful both for optimization and for generating diagnostics, and GCC does both of these things right now even if you don't use -fanalyzer.

Perhaps adding an example or two would help explain your point. But they'd need to be better examples than what's in v0.2 because v0.2 is unclear about this quality-of-diagnostics issue, as it relates to strtol.

Reply via email to