mktime does not specify EINVAL and should

Robert Elz via austin-group-l at The Open Group Tue, 20 Dec 2022 08:32:03 -0800

    Date:        Fri, 16 Dec 2022 17:31:03 +0000
    From:        "Geoff Clare via austin-group-l at The Open Group" 
<austin-group-l@opengroup.org>
    Message-ID:  <Y5yrV7nwvj676P/r@localhost>


  | Before I get into detailed responses, please note this is my last
  | working day before the holiday break, so I won't be contributing to
  | this discussion further until January.

OK.   I think this discussion has mostly reached a dead end anyway,
nothing (relevant to the topic anyway) is changing in any of the
recent messages.

  | I may have misled you a little in the way I worded a previous email.
  | It's not the time_t type itself that you can't do arithmetic on,

No, not misled, I knew what you meant, and I understand that arithmetic
types allow arithmetic, what matters is whether that arithmetic makes
sense in context or not.

  | The description of time() says:
  |
  |     The time function determines the current calendar time. The
  |     encoding of the value is unspecified.

That's what I was missing.   I don't know my way around the C standard,
and don't know where to ask people to look.

  | I agree with "related" but not with "corresponds".  By saying
  | "corresponds" you are assuming that the conversion is reversible,
  | i.e. that it is a one-to-one mapping.  It is not.

No, no such assumption, since mktime() input can have out of range
values, and inverting the time_t that results will never produce that, it
is clear that nothing can (necessarily) be reversed.   If you think that
"corresponds" implied that, then by all means we can pick a different word.

But "related" isn't really strong enough either (Wednesday is related
to Tuesday, it is the following day, but that doesn't mean that if the
input to mktime() specifies a date that is a Tuesday, it is OK to return
the following Wednesday instead, just because they're related).

The mktime() input must specify precisely what time_t value is to be
returned, otherwise the function is useless - calling functions (apart
from random number generators) that return results which are not what
the input requests be returned is a waste of time.

  | > The first thing to note is that this only applies to UTC times.
  | Hence the "corrected for timezone and any seasonal time adjustments"
  | in the preceding mktime() quote.

Yes, but we cannot make that correction until we have a UTC time to
correct, we don't know what correction to apply until after that is
done.   This is something of a dilemma, as the input is given in the
local timezone, but without enough information to allow that correction
to be made, until after we have found the corresponding time_t (UTC)
value (in general, and at the very least, until after we have a properly
in range, and well defined, local time value).

  | If the standard meant local time here it would say "local time".
  | The fact that it instead says "actual time of day" shows that it
  | does *not* mean local time.

I agree it doesn't mean only local time, but recall the actual time of day
is local time.  If the standard said "local time" it could be read as
simply meaning that local time is unspecified (which it largely is)
rather than also meaning that the system's clock is not necessarily
synchronised with that local time (or UTC), which it is also saying.
It means both.   It could require synchronised times (for some applications
that's needed), but doesn't (and shouldn't in general), but it cannot
specify how local time works (which includes how it corresponds to UTC),
that's outside of POSIX's jurisdiction.

  | As quoted above, mktime() first converts the broken-down time to UTC
  | seconds since the Epoch

It can't.   And nothing in the standard says that it should, as that
would be absurd, one cannot convert a local time into a UTC time (however
that is reckoned, here as a count of seconds since the Epoch, but that
detail is irrelevant) without knowing the local timezone information
first.

  | and then corrects it for "timezone and any seasonal time adjustments".

No, that isn't what it says at all.   If it did it would be ridiculous.
But it doesn't.   What it does say (and today I'm quoting from Issue 7 TC2
(2018 edition, ie: c181, not that, other than the page and line numbers,
it makes any difference, this part has not been changed (yet anyway)).

Page 1331, lines 44305-7:

        The mktime( ) function shall convert the broken-down time,
        expressed as local time, in the structure pointed to by timeptr,
        into a time since the Epoch value with the same encoding as that
        of the values returned by time( ).

So clearly, we have a local time as input.   Not disputed I believe.

Then, same page, lines 44315-8

        The relationship between the tm structure (defined in the <time.h>
        header) and the time in seconds since the Epoch is that the result
        shall be as specified in the expression given in the definition of
        seconds since the Epoch (see XBD Section 4.16, on page 113) corrected 
        for timezone and any seasonal time adjustments,

Do you see any "and then" in there?  Rather it says that the result is
as in "the expression in XBD 4.16 corrected for ..."   We need to make the
corrections in order to run that expression, as the expression is only
defined for UTC values.    It has to be, there simply is no simple formula
that can possibly apply to local time values, they're not a continuous mapping.

  | The description of TZ in XBD 8.3 provides
  | the details of what the timezone and seasonal time adjustments are
  | (for values not beginning with colon - remember that we are discussing
  | something I claimed only for those values).

That might be all you want to discuss, but it is *not* what we are discussing.
mktime() needs to work for any TZ setting that can possibly exist, the Issue 7
non-colon form, its colon form, and the new form to be added in Issue 8.

There's no point making it work (or specified) only for the one form of TZ
specification which is functionally useless.   How would an application
ever deal with that?

You can continue limiting yourself to that meaningless case if you like
(though you're no more correct there than in the cases that matter - even
in the old style TZ strings, one must know the (in range) month, mday,
and time of day, to determine whether the summer time, or standard time,
offset applies).

But as long as you keep doing that, everyone else can safely ignore you,
as the result cannot possibly be useful for the standard going forward.

  | Now that I see this text can be misinterpreted, I will include a
  | clarification when I propose wording changes.

Make sure that your clarification is in line with what makes sense,
the time since the Epoch value to be returned must be derived from the
broken down time that is the (indirect) arg to the function (the actual
arg is the pointer to it, but that's an irrelevant detail).

  | I said this was my attitude to the function. I made no claims about
  | the standard.

Fine, but all we are discussing here (or what we should be discussing)
is what the standard currently requires.   Once we know that, we can
determine whether that is correct or not (OK, I think we already agree
that it isn't) - but at that point we know that we are changing the
standard (the written standard) not just clarifying what it said all along.

  | You misunderstand what my quotes implied. What I was trying to say is
  | that mktime() accepts broken-down times that the proverbial
  | man-in-the-street would consider incorrect.

OK, but there was never a need to say that, it was pointless.  We all
know that the input to mktime() can have outlandish values, as would
not be considered a sane local time value by that proverbial man.
That's not useful information to repeat.

  | I was talking about what happens in practice, not about what's specified.

OK, good, we're agreeing that what is in the standard does not
specify what implementations actually do, and now we should be
specifying that as a correction (a complete change really).


  | > But this one is not the de-facto standard.   It is (like the others)
  | > clearly not in any written standard, but here, implementations (as you
  | > know) differ.   This might be what you'd like to see happen, but it is
  | > not necessarily what does happen.
  |
  | I would say it definitely is the de-facto standard.  Something can be
  | a de-facto standard without being universally followed.

That's true.

  | So far we
  | know of dozens of systems that behave the same and only one system
  | (NetBSD) that behaves differently.

But that's not, you yourself showed that FreeBSD and MacOS are also
different:

austin-group-l@opengroup.org (this was you) said:

  | Glibc produced 378663600 and "Thu Dec 31 23:50:00 1981", so it is the
  | same as Solaris except for the change being at the wrong time.

  | FreeBSD and MacOS returned -1.

The "at the wrong time" issue will be fixed now if the relevant systems
tzdata has been updated to (at least) 2022g.     The rest of that, is what you
discovered and reported in your message of Tue, 22 Nov 2022 12:49:13 +0000
(its Message-ID was: <Y3zFSV/bW8mrpgTb@localhost> which is not exactly
compliant with the e-mail standards, "@localhost" is not good enough, but
that doesn't matter here).

  | (Note that I was careful to say
  | "system" not "implementation" - Paul Eggert's C defect report said
  | that the Arthur Olsen implementation behaved differently in 1994, but
  | that is not a "system".)

Doesn't really matter, that is the implementation used, with modifications,
essentially everywhere these days - but all the systems do tend to modify
it.   It is certainly what is in glibc() (modified).

  | It's a shame that while mktime() implementations all do this case the
  | same way, for gap times some give 01:30 and some 03:30.

And some give an error.   You provided the evidence of that (beyond just 
NetBSD).  (And we only know about the few systems that have been tested so
far, there are more).

  | It would have been good to be able to require one of the two results,

[three results]

And yes, it would be, and in this case, the error return is the one that
makes most sense, the others hide the truth.   The same issue applies in
the "fold back" period, but is perhaps worse, as there's no way there for
the application to see something odd happened.

But I agree, since we are amending the standard to fit with the existing
implementations, which all seem to comply with what the existing standard
actually requires (almost nothing) those areas where the implementations
disagree need to be left as unspecified or implementation defined.

  | There's no reason to expect the Feb 29 case to produce a single
  | unambiguous result either.  It is just fortunate that it does.

Yes.   In this regard everyone copied (in one sense or another) the first
implementation of normalising the struct tm values it seems, rather than
doing it their own way.

  | Yes, we'll need to say "it is unspecified whether ..." for the cases
  | where existing implementations do the adjustment differently.

Good, then I think we're agreed, for times in the gap, and in the fold
back period (at least when tm_isdst was -1 in the input struct tm) it
will be unspecified whether a time before the gap, or the first occurrence
of a time in the fold back, or a time after the gap, or the second, or
perhaps later occurrence of a fold back time, or in either case, an error,
is returned, as that's what the various implementations (as that word is
used in POSIX - more or less as a synonym for system) do.

  | You are interpreting the standard incorrectly because of your
  | knowledge of mktime() internal implementation details.  You need to
  | read what the standard actually says and try not to be influenced by
  | that prior knowledge.

Nonsense.   I am reading what the standard actually says, which is
almost nothing.   You seem to be reading it to conform to (and require only)
your expectation of what the implementations should do.

  | Everywhere the mktime() description refers to a struct tm member, it
  | is talking about the value that was passed in to mktime(), with the
  | exception of the last paragraph where it says the values are set on
  | successful completion.

OK.

  | So the values used in the expression from XBD 4.16 (or 4.17 in D2.1)
  | are the values passed in to mktime(), with the exception of tm_yday
  | which is calculated from the other members.

No, what it says is (as quoted above):

        the result
        shall be as specified in the expression [...] corrected 
        for timezone and any seasonal time adjustments,

The expression is corrected for timezone and seasonal time adjustments,
not the result, which you are assuming.   It has to be, the expression
is only defined for UTC.  Read XBD 4.16 (or 4.17 as appropriate) for
yourself, and you cannot come to any other conclusion.  It is clearly for
UTC only.

  | It may well be that some existing implementations do the range
  | corrections in place in *timeptr first and then calculate tm_yday
  | from them.

The "in place" is irrelevant, but this isn't an implementation detail,
there is no way to calculate tm_yday without first knowing what tm_year,
tm_mon, and tm_mday will be.   Once an implementation has determined all
of those values (which also require tm_sec tm_min and tm_hour to first
be determined) there is no rational expectation that all that work will
be thrown away, and done again later.   That beggars belief.   Really!

  | It is permitted by the usual implied "as if" rule, as long as the end
  | result is the same as would result from doing things in the way
  | described in the standard.

Sure.  But the standard does not say what you think it does.   You just
want it to say that.

  | I.e. a value to be used as tm_yday in
  | the expression is calculated from the original field values in
  | *timeptr,

It would be interesting to see you do this, in working code, without
altering those original field values - and have it correctly apply to
timezones where whole days have gone missing in some particular years.
(or where whole days have been repeated, which I am not sure has ever
happened, but easily could).

  | However, it seems to me that if I was going to implement mktime() I
  | would do the range adjustments first

Of course you would, it is the only way that makes sense.

  | but I would not do them in place.
  | Preventing overflow while doing them in place would be a nightmare.

Not unless you're on a system with 8 bit int types (or smaller) - and
even then for struct tm 8 bits is enough for all but tm_yday and tm_year.
But posix requires at least 32 bits I believe, so there really is no possible
overflow issue for anything except tm_year - which has no specified range,
hence isn't adjusted the same way, further should it overflow when doing
the range adjustment for the other fields, it will also overflow when
generating the final struct tm result.

That's another thing that the standard currently doesn't handle at all.
It allows EOVERFLOW for when the time_t result won't fit (which is easy
to make happen with a 32 bit time_t, but almost impossible for a 64 bit
time_t), but if that doesn't happen, it requires that the struct tm
be updated to reflect the time_t - ignoring the possibility that the
generated tm_year might not fit (that's easy to make happen, just set the
input to mktime() with tm_year set to MAXINT, and tm_mon set to MAXINT as
well (and for this we will assume the other fields are all in range) - then
the resulting tm_year (assuming the conventional method of adjusting tm_mon)
would need to be MAXINT + MAXINT/12 which is clearly > MAXINT, and hence
cannot fit in the int which tm_year is required to be.

Another case where the current standard (even worse with that added in
1613 bug fix applied) specifies the un-implementable.

  | Much easier to copy each value to a variable of a wider type and then
  | do the range adjustments on those.

Aside from tm_year (which as above, has other issues) all the other fields
have a very small range.   For this, ignoring that tm_mday's max value
depends upon tm_year and tm_mon (which just complicates the code, doesn't
change its nature), reducing values to be in range can be done (roughly) as

        adjust = 0;
        while (field < min_field_value) {
                field += (max_field_value - min_field_value + 1);
                adjust--;
        }
        while (field > max_field_value) {
                field -= (max_field_value - min_field_value + 1);
                adjust++;
        }

Only one of the two loops ever runs of course.   That can be done more
efficiently using division and modulus (though getting those right when
using negative numbers always challenged me... and that way gets quite
complex when the field is tm_mday) but that's just a detail.

Nothing overflows.   Someone adept at generating proofs of program
correctness could actually prove that, but that's not me.

After that, the next (bigger effect) field is adjusted the same way, and
once that is done, when we know it is now somewhere in the 0..100 range
(none of the affected values are in range and outside those values) we
just add the previous field's adjustment calculated above (that's limited
to MAXINT/24 in the worse case, and the now in-range field value is much
smaller than that, so this addition cannot overflow either, the biggest we
can get is smaller than MAXINT/23 (or larger than -MAXINT/23)), and then
bring this next field into range again (repeat the above, without first
setting adjust=0).

Repeat for the following fields until done.

Simple, and nothing wider than an int is needed, nothing can ever overflow
(as long as int is wide enough - which 32 bits certainly is, 16 probably is
as well, 8 might be for the fields that matter anyway).

An implementation using wider vars for the fields is certainly possible,
but on some architectures, probably slower (maybe much slower) (on others it
would likely be faster).

  | Only in your fantasy world of an idealistically "correct" mktime().
  | The standard requires the broken-down time to be converted to a time
  | since the Epoch.  It is that value which is being referred to here.

If the standard required the moon be constructed from green cheese, would
you require that to be done, or simply admit that the standard is wrong?
No matter what the standard seems to say, if it requires the impossible,
it is either being misinterpreted, or flat out wrong.   You can decide for
yourself which is happening here.

If there is no suitable "seconds from the epoch" value that can be
derived from the broken down time in *timeptr, then no matter what the
standard says, or seems to say, there is none, and one cannot simply
be invented.

If one was required to be invented, then I'll invent Dec 31, 1969,
23:59:59 (UTC) as the time to return.   That's as good as anything
else to represent a time which does not exist (which isn't to say
that it is good).

In this regard, do note that a value of 2 for tm_hour is by no-one's
definition "out of range" for the tm_hour field.   Ever.

  | How ironic.  My lack of such knowledge is the very thing that allows me
  | to interpret the standard correctly without difficulty, whereas you
  | having that knowledge is making you interpret it incorrectly.

Except that the standard is supposed to specify what the implementations
actually do, which is something one needs some knowledge of what that is
to determine.   If the written standard is specifying differently than the
implementations, the standard is wrong.

You seem to have some fantasy expectation of what mktime() is supposed to
do, which is not actually specified anywhere - but that given that the
specification is so weak, you're dreaming that it actually specifies your
fantasy.   It doesn't - it actually specifies just about nothing.

  | >   | > Agreed.   But we need to pick tm_isdst = 0 or tm_isdst = 1, and
  | >   | > which we pick will alter what time_t value gets returned.   There's
  | >   | > nothing anywhere that suggests which one should be selected.
  | >   |
  | >   | Correct, and since the standard is silent on this, either behaviour
  | >   | is allowed.
  | > 
  | > But why just those two,
  |
  | Huh? *You* just said "we need to pick tm_isdst = 0 or tm_isdst = 1".

That's not the "those two" I was referring to.   They were the two specific
time_t results which you want to require as alternative permitted results.
There are many more which would be just as rational to return in this case,
assuming that there was any good reason not to just return an error, which
is the sane behaviour, not just the 2 results which suit your view of how
things should work.

  | Don't be ridiculous.  Your condition stated a few lines above was
  | "if I want to add 5 minutes to a time_t", given that you can't just
  | add an appropriate number of seconds (or some other unit) to a time_t.
  | To achieve that you would certainly *not* set tm_isdst=-1, as that
  | would not be the equivalent of adding some number to a time_t.

I certainly would set tm_isdst to -1, as that's what the mktime()
definition says it should be (unless I know whether the time is
standard or summer time, which in this case, I do not - after it
has been adjusted).   That's what the standard says to do.   This
part is (somehow, given the state of the rest of it) clear.

If I was using UTC (using gmtime() and timegm() instead of localtime()
and mktime()) then I would set tm_isdst=0, as there I know there is never
summer time to complicate things (though setting it to -1 would work as
well, as the implementation knows that as well, all UTC times are tm_isdst==0).
That would be the correct way to implement arithmetic when we don't
know that we can simply add to (or subtract from) a time_t variable,
using local time is fraught with dangers for this purpose.  I would advise
against anything which suggests that it is safe or sane to attempt
(nothing currently in POSIX, or as best I can tell, in C either, suggests
that it is).

If we were doing inventions, we'd just specify add_time() and sub_time()
functions, with a time_t result and arg, and some form for additional
args saying how much to add/subtract.    But we're not doing that, and
it seems it would be too late now anyway.   (On POSIX those functions could
simply convert the additional arg(s) to seconds, and just add that to the
input time_t (with overflow protection of course) - on non POSIX C systems
something more complex would be needed, but as long as the implementation
of the functions is implementation defined, those could take advantage of
whatever form the particular format of a time_t happens to be, and not
need to detour through a struct tm).

  | It follows from the fact that mktime() is required to use the original
  | struct tm field values when applying the formula in XBD 4.16.

That's not a fact, it makes no sense, and simply could not work.  Further
that section itself says as much.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

Reply via email to