mktime does not specify EINVAL and should

Geoff Clare via austin-group-l at The Open Group Tue, 13 Dec 2022 08:53:19 -0800

Robert Elz wrote, on 12 Dec 2022:
>
>     Date:        Mon, 12 Dec 2022 12:02:39 +0000
>     From:        "Geoff Clare via austin-group-l at The Open Group" 
> <austin-group-l@opengroup.org>
> 
> C23 is apparently going to have timegm() (the mktime() equivalent for UTC
> instead of localtime).   Using gmtime() modifying the struct tm, and then
> timegm() to get the time_t back would work much better, at least if the
> specification of timegm() is better than that of mktime() (I haven't
> seen it).   I know it is getting very late in the process, but perhaps
> we should also be adding timegm() now.


It is too late to add timegm() in Issue 8.  It will automatically get
added in Issue 9 as that will (presumably) align with C23 or later.

>   | By a strict reading, you may be right, but it is strongly implied by
>   | "shall be set to represent the specified time since the Epoch".
> 
> That's fine when the specified time (that is, the time passed in in *timeptr)
> is a time that exists.

This statement provides a big clue as to why you are misinterpreting the
standard, and why your attitude towards mktime() is so different from
everybody else's.

You are suffering from a misconception that *timeptr somehow "specifies"
a time since the Epoch.  It does not!  It specifies a broken-down time.
The standard describes, in detail (in the paragraph beginning "The
relationship between ..."), how this broken-down time is *converted* to
an integer "time since the Epoch" value.

When the standard says "shall be set to represent the specified time since
the Epoch" it is talking about the integer value that *it* specifies to
be calculated from the broken-down time in *timeptr. It is not in any way
suggesting that *timeptr "specifies" a time since the Epoch.

In trying to treat *timeptr as "specifying" a time since the Epoch, you
are misunderstanding the intention and misinterpreting the meaning of
much of the mktime() text.

Since I mentioned attitudes, I'll explain mine.  It is that mktime()
follows the well-known principle "be liberal in what you accept, and
conservative in what you send" (which originated in relation to
communication protocols but I think applies very well here).
Applying this principle to mktime() means you can give it an
"incorrect" broken-down time and it will make sense of it and give
you back a correct time.  For example:

* If you give it Feb 29 in a non-leap year it treats that as the day
  after Feb 28 and gives you back Mar 1.

* If you give it Feb 0 it treats that as the day before Feb 1 and
  gives you back Jan 31.

* If you give it 21:65 it treats that as 6 minutes after 21:59 and
  gives you back 22:05. 

* If you give it tm_isdst=0 for a time when DST is in effect, it gives
  you back a positive tm_isdst and alters the other fields appropriately.

* If there is a DST transition where 02:00 standard time becomes 03:00
  DST and you give mktime() 02:30 (with negative tm_isdst), it treats
  that as either 30 minutes after 02:00 standard time or 30 minutes
  before 03:00 DST and gives you back a zero or positive tm_isdst,
  respectively, with the tm_hour field altered appropriately.

* If a geographical timezone changes its UTC offset such that "old 00:00"
  becomes "new 00:30" and you give it 00:20, it treats that as either
  20 minutes after "old 00:00" or 10 minutes before "new 00:30", and
  gives you back appropriately altered struct tm fields.

And yes, having listed that last case along with the others, I see no
reason that it should not follow the same principle.  The "treats it as"
wording is much the same as the DST transition case.

Returning -1 for any of these cases violates the "be liberal in what
you accept" part of the principle.

> mday 312, minute -1234, hour 999, second -23456789, year (anything that
> doesn't cause time_t overflow for the implementation) tm_isdst anything
> represents.   If you can find something somewhere that specifies what
> that means, in the C or POSIX standards (or just about any other standard
> you care to reference) then great.   mktime() allows that input, but I
> see nothing that says which particular time_t value should be returned.
> 
> You might be imagining how an implementation might deal with this, as can
> I, the two might even be the same - but it is certainly not specified
> anywhere.

I agree it's not clear for pathological cases like that.  It comes down
to this statement:

    the tm_yday value used in the expression is the day of the year
    from 0 to 365 inclusive, calculated from the other tm structure
    members

It may be worth trying to improve this, if implementations all do the
tm_yday calculation the same way, but it has no real relevance in the
matter of whether mktime() can return -1 for "incorrect" broken-down
times.  If it doesn't allow it when all of the tm fields are in their
normal ranges, then it also doesn't allow it when they are outside
those ranges.

>   | In any case, it is being clarified by bug 1613.
> 
> Unless you made more changes there than I thought, no, it isn't.
> The extra text that was added there just says what the returned
> struct tm (in *timeptr) must be, in relationship to the time_t
> returned.   It says nothing at all about how that time_t is selected.

And I didn't claim that it does.  What I said (which you trimmed) was:

    By a strict reading, you may be right, but it is strongly implied by
    "shall be set to represent the specified time since the Epoch".  In
    any case, it is being clarified by bug 1613.

My point was entirely about this "shall be set to represent" text, i.e.
about what the returned struct tm fields must contain.  The context for
this was Don's point that Feb 29 2023 has the tm fields in their stated
ranges and so the standard, as written, allows the returned struct tm
to be left as Feb 29 2023.  The change in bug 1613 requires them to be
set to the values that would be returned by localtime(), so this will
no longer be allowed.

>   | This would definitely not meet the requirement "shall be set to
>   | represent the specified time since the Epoch".
> 
> Of course it could.   If the time passed in contains out of range
> values, there is no defined meaning that can be attributed to them.
> If you can find somewhere where that's stated, then please, enlighten us.

The above quote is all that's needed, provided "the specified time
since the Epoch" is correctly interpreted (which you are not doing).
The time since the Epoch being referred to here is a known integer
value which mktime() is going to return.  The above text requires
mktime() to set the struct tm fields to represent that specific, known,
time since the Epoch value.  (The adjustment to bring struct tm fields
into range is done after this value is known - see below).  The sort of
adjustment you were suggesting, "if (t->tm_sec < 0) t->tm_sec = 0",
etc. would cause the fields to no longer represent that time since the
Epoch.

>   | and then requires (on successful completion) that the fields in the
>   | broken-down time are updated to
>   | "represent the specified time since the Epoch".
> 
> Yes, this part is not controversial.
> 
>   | Your suggested other adjustments would not represent the time since
>   | the Epoch that is going to be returned.
> 
> Of course it would, the adjustments are made to create a struct tm
> that only contains in-range values, and then from that a time_t is
> produced.

No, the adjustment to bring struct tm fields into range is done after
the time since the Epoch value has been calculated.  This is clear just
from the order in which things are described on the mktime() page, but
also from the use of "Upon successful completion", since mktime() can't
know whether it will complete successfully until it has calculated the
time_t value it is going to return.

>   | Huh?  The struct tm values don't need altering in this case (except
>   | for tm_isdst obviously).
> 
> Agreed.   But we need to pick tm_isdst = 0 or tm_isdst = 1, and
> which we pick will alter what time_t value gets returned.   There's
> nothing anywhere that suggests which one should be selected.

Correct, and since the standard is silent on this, either behaviour
is allowed.

>   | > As you indicate, the actual ranges within which the struct tm values are
>   | > "forced" is one which matches values that localtime() would return
>   |
>   | Which is what Issue 8 will require (courtesy of bug 1613).
> 
> No, that's not what that says.   I can see you're presuming that the
> implementation calculates a time_t first, and then adjusts the tm to
> match.   That's not required.

Yes it is.  See above.

>   | Future applications could
>   | check errno, but it would be preferable to disallow the (time_t)-1
>   | return for times in the gap so that existing applications are guaranteed
>   | not to misbehave when ported to any (existing or future) conforming 
> system.
> 
> But unless it is specified what the result must be in that case,
> applications moved from a system which generates one result might
> fail on one which generates a different one.   You've already demonstrated
> that there are implementations which return different results for these
> times - and you seem to consider that OK.   I don't.

The only real potential for problems here is if an application does
small (less than a day) additions/subtractions using the struct tm
fields and sets tm_isdst=-1.  Then it might work fine on one
implementation but get into the kind of loop you described in an
earlier mail on an implementation that behaves the other way. But, as
I pointed out in that earlier discussion, no real application would
do that.  The fact that no such problems have come to light in the
last 30 years also means that in practice this is a non-problem.

-- 
Geoff Clare <g.cl...@opengroup.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

Reply via email to