Date: Fri, 16 Dec 2022 17:31:03 +0000 From: "Geoff Clare via austin-group-l at The Open Group" <austin-group-l@opengroup.org> Message-ID: <Y5yrV7nwvj676P/r@localhost>
| Before I get into detailed responses, please note this is my last | working day before the holiday break, so I won't be contributing to | this discussion further until January. OK. I think this discussion has mostly reached a dead end anyway, nothing (relevant to the topic anyway) is changing in any of the recent messages. | I may have misled you a little in the way I worded a previous email. | It's not the time_t type itself that you can't do arithmetic on, No, not misled, I knew what you meant, and I understand that arithmetic types allow arithmetic, what matters is whether that arithmetic makes sense in context or not. | The description of time() says: | | The time function determines the current calendar time. The | encoding of the value is unspecified. That's what I was missing. I don't know my way around the C standard, and don't know where to ask people to look. | I agree with "related" but not with "corresponds". By saying | "corresponds" you are assuming that the conversion is reversible, | i.e. that it is a one-to-one mapping. It is not. No, no such assumption, since mktime() input can have out of range values, and inverting the time_t that results will never produce that, it is clear that nothing can (necessarily) be reversed. If you think that "corresponds" implied that, then by all means we can pick a different word. But "related" isn't really strong enough either (Wednesday is related to Tuesday, it is the following day, but that doesn't mean that if the input to mktime() specifies a date that is a Tuesday, it is OK to return the following Wednesday instead, just because they're related). The mktime() input must specify precisely what time_t value is to be returned, otherwise the function is useless - calling functions (apart from random number generators) that return results which are not what the input requests be returned is a waste of time. | > The first thing to note is that this only applies to UTC times. | Hence the "corrected for timezone and any seasonal time adjustments" | in the preceding mktime() quote. Yes, but we cannot make that correction until we have a UTC time to correct, we don't know what correction to apply until after that is done. This is something of a dilemma, as the input is given in the local timezone, but without enough information to allow that correction to be made, until after we have found the corresponding time_t (UTC) value (in general, and at the very least, until after we have a properly in range, and well defined, local time value). | If the standard meant local time here it would say "local time". | The fact that it instead says "actual time of day" shows that it | does *not* mean local time. I agree it doesn't mean only local time, but recall the actual time of day is local time. If the standard said "local time" it could be read as simply meaning that local time is unspecified (which it largely is) rather than also meaning that the system's clock is not necessarily synchronised with that local time (or UTC), which it is also saying. It means both. It could require synchronised times (for some applications that's needed), but doesn't (and shouldn't in general), but it cannot specify how local time works (which includes how it corresponds to UTC), that's outside of POSIX's jurisdiction. | As quoted above, mktime() first converts the broken-down time to UTC | seconds since the Epoch It can't. And nothing in the standard says that it should, as that would be absurd, one cannot convert a local time into a UTC time (however that is reckoned, here as a count of seconds since the Epoch, but that detail is irrelevant) without knowing the local timezone information first. | and then corrects it for "timezone and any seasonal time adjustments". No, that isn't what it says at all. If it did it would be ridiculous. But it doesn't. What it does say (and today I'm quoting from Issue 7 TC2 (2018 edition, ie: c181, not that, other than the page and line numbers, it makes any difference, this part has not been changed (yet anyway)). Page 1331, lines 44305-7: The mktime( ) function shall convert the broken-down time, expressed as local time, in the structure pointed to by timeptr, into a time since the Epoch value with the same encoding as that of the values returned by time( ). So clearly, we have a local time as input. Not disputed I believe. Then, same page, lines 44315-8 The relationship between the tm structure (defined in the <time.h> header) and the time in seconds since the Epoch is that the result shall be as specified in the expression given in the definition of seconds since the Epoch (see XBD Section 4.16, on page 113) corrected for timezone and any seasonal time adjustments, Do you see any "and then" in there? Rather it says that the result is as in "the expression in XBD 4.16 corrected for ..." We need to make the corrections in order to run that expression, as the expression is only defined for UTC values. It has to be, there simply is no simple formula that can possibly apply to local time values, they're not a continuous mapping. | The description of TZ in XBD 8.3 provides | the details of what the timezone and seasonal time adjustments are | (for values not beginning with colon - remember that we are discussing | something I claimed only for those values). That might be all you want to discuss, but it is *not* what we are discussing. mktime() needs to work for any TZ setting that can possibly exist, the Issue 7 non-colon form, its colon form, and the new form to be added in Issue 8. There's no point making it work (or specified) only for the one form of TZ specification which is functionally useless. How would an application ever deal with that? You can continue limiting yourself to that meaningless case if you like (though you're no more correct there than in the cases that matter - even in the old style TZ strings, one must know the (in range) month, mday, and time of day, to determine whether the summer time, or standard time, offset applies). But as long as you keep doing that, everyone else can safely ignore you, as the result cannot possibly be useful for the standard going forward. | Now that I see this text can be misinterpreted, I will include a | clarification when I propose wording changes. Make sure that your clarification is in line with what makes sense, the time since the Epoch value to be returned must be derived from the broken down time that is the (indirect) arg to the function (the actual arg is the pointer to it, but that's an irrelevant detail). | I said this was my attitude to the function. I made no claims about | the standard. Fine, but all we are discussing here (or what we should be discussing) is what the standard currently requires. Once we know that, we can determine whether that is correct or not (OK, I think we already agree that it isn't) - but at that point we know that we are changing the standard (the written standard) not just clarifying what it said all along. | You misunderstand what my quotes implied. What I was trying to say is | that mktime() accepts broken-down times that the proverbial | man-in-the-street would consider incorrect. OK, but there was never a need to say that, it was pointless. We all know that the input to mktime() can have outlandish values, as would not be considered a sane local time value by that proverbial man. That's not useful information to repeat. | I was talking about what happens in practice, not about what's specified. OK, good, we're agreeing that what is in the standard does not specify what implementations actually do, and now we should be specifying that as a correction (a complete change really). | > But this one is not the de-facto standard. It is (like the others) | > clearly not in any written standard, but here, implementations (as you | > know) differ. This might be what you'd like to see happen, but it is | > not necessarily what does happen. | | I would say it definitely is the de-facto standard. Something can be | a de-facto standard without being universally followed. That's true. | So far we | know of dozens of systems that behave the same and only one system | (NetBSD) that behaves differently. But that's not, you yourself showed that FreeBSD and MacOS are also different: austin-group-l@opengroup.org (this was you) said: | Glibc produced 378663600 and "Thu Dec 31 23:50:00 1981", so it is the | same as Solaris except for the change being at the wrong time. | FreeBSD and MacOS returned -1. The "at the wrong time" issue will be fixed now if the relevant systems tzdata has been updated to (at least) 2022g. The rest of that, is what you discovered and reported in your message of Tue, 22 Nov 2022 12:49:13 +0000 (its Message-ID was: <Y3zFSV/bW8mrpgTb@localhost> which is not exactly compliant with the e-mail standards, "@localhost" is not good enough, but that doesn't matter here). | (Note that I was careful to say | "system" not "implementation" - Paul Eggert's C defect report said | that the Arthur Olsen implementation behaved differently in 1994, but | that is not a "system".) Doesn't really matter, that is the implementation used, with modifications, essentially everywhere these days - but all the systems do tend to modify it. It is certainly what is in glibc() (modified). | It's a shame that while mktime() implementations all do this case the | same way, for gap times some give 01:30 and some 03:30. And some give an error. You provided the evidence of that (beyond just NetBSD). (And we only know about the few systems that have been tested so far, there are more). | It would have been good to be able to require one of the two results, [three results] And yes, it would be, and in this case, the error return is the one that makes most sense, the others hide the truth. The same issue applies in the "fold back" period, but is perhaps worse, as there's no way there for the application to see something odd happened. But I agree, since we are amending the standard to fit with the existing implementations, which all seem to comply with what the existing standard actually requires (almost nothing) those areas where the implementations disagree need to be left as unspecified or implementation defined. | There's no reason to expect the Feb 29 case to produce a single | unambiguous result either. It is just fortunate that it does. Yes. In this regard everyone copied (in one sense or another) the first implementation of normalising the struct tm values it seems, rather than doing it their own way. | Yes, we'll need to say "it is unspecified whether ..." for the cases | where existing implementations do the adjustment differently. Good, then I think we're agreed, for times in the gap, and in the fold back period (at least when tm_isdst was -1 in the input struct tm) it will be unspecified whether a time before the gap, or the first occurrence of a time in the fold back, or a time after the gap, or the second, or perhaps later occurrence of a fold back time, or in either case, an error, is returned, as that's what the various implementations (as that word is used in POSIX - more or less as a synonym for system) do. | You are interpreting the standard incorrectly because of your | knowledge of mktime() internal implementation details. You need to | read what the standard actually says and try not to be influenced by | that prior knowledge. Nonsense. I am reading what the standard actually says, which is almost nothing. You seem to be reading it to conform to (and require only) your expectation of what the implementations should do. | Everywhere the mktime() description refers to a struct tm member, it | is talking about the value that was passed in to mktime(), with the | exception of the last paragraph where it says the values are set on | successful completion. OK. | So the values used in the expression from XBD 4.16 (or 4.17 in D2.1) | are the values passed in to mktime(), with the exception of tm_yday | which is calculated from the other members. No, what it says is (as quoted above): the result shall be as specified in the expression [...] corrected for timezone and any seasonal time adjustments, The expression is corrected for timezone and seasonal time adjustments, not the result, which you are assuming. It has to be, the expression is only defined for UTC. Read XBD 4.16 (or 4.17 as appropriate) for yourself, and you cannot come to any other conclusion. It is clearly for UTC only. | It may well be that some existing implementations do the range | corrections in place in *timeptr first and then calculate tm_yday | from them. The "in place" is irrelevant, but this isn't an implementation detail, there is no way to calculate tm_yday without first knowing what tm_year, tm_mon, and tm_mday will be. Once an implementation has determined all of those values (which also require tm_sec tm_min and tm_hour to first be determined) there is no rational expectation that all that work will be thrown away, and done again later. That beggars belief. Really! | It is permitted by the usual implied "as if" rule, as long as the end | result is the same as would result from doing things in the way | described in the standard. Sure. But the standard does not say what you think it does. You just want it to say that. | I.e. a value to be used as tm_yday in | the expression is calculated from the original field values in | *timeptr, It would be interesting to see you do this, in working code, without altering those original field values - and have it correctly apply to timezones where whole days have gone missing in some particular years. (or where whole days have been repeated, which I am not sure has ever happened, but easily could). | However, it seems to me that if I was going to implement mktime() I | would do the range adjustments first Of course you would, it is the only way that makes sense. | but I would not do them in place. | Preventing overflow while doing them in place would be a nightmare. Not unless you're on a system with 8 bit int types (or smaller) - and even then for struct tm 8 bits is enough for all but tm_yday and tm_year. But posix requires at least 32 bits I believe, so there really is no possible overflow issue for anything except tm_year - which has no specified range, hence isn't adjusted the same way, further should it overflow when doing the range adjustment for the other fields, it will also overflow when generating the final struct tm result. That's another thing that the standard currently doesn't handle at all. It allows EOVERFLOW for when the time_t result won't fit (which is easy to make happen with a 32 bit time_t, but almost impossible for a 64 bit time_t), but if that doesn't happen, it requires that the struct tm be updated to reflect the time_t - ignoring the possibility that the generated tm_year might not fit (that's easy to make happen, just set the input to mktime() with tm_year set to MAXINT, and tm_mon set to MAXINT as well (and for this we will assume the other fields are all in range) - then the resulting tm_year (assuming the conventional method of adjusting tm_mon) would need to be MAXINT + MAXINT/12 which is clearly > MAXINT, and hence cannot fit in the int which tm_year is required to be. Another case where the current standard (even worse with that added in 1613 bug fix applied) specifies the un-implementable. | Much easier to copy each value to a variable of a wider type and then | do the range adjustments on those. Aside from tm_year (which as above, has other issues) all the other fields have a very small range. For this, ignoring that tm_mday's max value depends upon tm_year and tm_mon (which just complicates the code, doesn't change its nature), reducing values to be in range can be done (roughly) as adjust = 0; while (field < min_field_value) { field += (max_field_value - min_field_value + 1); adjust--; } while (field > max_field_value) { field -= (max_field_value - min_field_value + 1); adjust++; } Only one of the two loops ever runs of course. That can be done more efficiently using division and modulus (though getting those right when using negative numbers always challenged me... and that way gets quite complex when the field is tm_mday) but that's just a detail. Nothing overflows. Someone adept at generating proofs of program correctness could actually prove that, but that's not me. After that, the next (bigger effect) field is adjusted the same way, and once that is done, when we know it is now somewhere in the 0..100 range (none of the affected values are in range and outside those values) we just add the previous field's adjustment calculated above (that's limited to MAXINT/24 in the worse case, and the now in-range field value is much smaller than that, so this addition cannot overflow either, the biggest we can get is smaller than MAXINT/23 (or larger than -MAXINT/23)), and then bring this next field into range again (repeat the above, without first setting adjust=0). Repeat for the following fields until done. Simple, and nothing wider than an int is needed, nothing can ever overflow (as long as int is wide enough - which 32 bits certainly is, 16 probably is as well, 8 might be for the fields that matter anyway). An implementation using wider vars for the fields is certainly possible, but on some architectures, probably slower (maybe much slower) (on others it would likely be faster). | Only in your fantasy world of an idealistically "correct" mktime(). | The standard requires the broken-down time to be converted to a time | since the Epoch. It is that value which is being referred to here. If the standard required the moon be constructed from green cheese, would you require that to be done, or simply admit that the standard is wrong? No matter what the standard seems to say, if it requires the impossible, it is either being misinterpreted, or flat out wrong. You can decide for yourself which is happening here. If there is no suitable "seconds from the epoch" value that can be derived from the broken down time in *timeptr, then no matter what the standard says, or seems to say, there is none, and one cannot simply be invented. If one was required to be invented, then I'll invent Dec 31, 1969, 23:59:59 (UTC) as the time to return. That's as good as anything else to represent a time which does not exist (which isn't to say that it is good). In this regard, do note that a value of 2 for tm_hour is by no-one's definition "out of range" for the tm_hour field. Ever. | How ironic. My lack of such knowledge is the very thing that allows me | to interpret the standard correctly without difficulty, whereas you | having that knowledge is making you interpret it incorrectly. Except that the standard is supposed to specify what the implementations actually do, which is something one needs some knowledge of what that is to determine. If the written standard is specifying differently than the implementations, the standard is wrong. You seem to have some fantasy expectation of what mktime() is supposed to do, which is not actually specified anywhere - but that given that the specification is so weak, you're dreaming that it actually specifies your fantasy. It doesn't - it actually specifies just about nothing. | > | > Agreed. But we need to pick tm_isdst = 0 or tm_isdst = 1, and | > | > which we pick will alter what time_t value gets returned. There's | > | > nothing anywhere that suggests which one should be selected. | > | | > | Correct, and since the standard is silent on this, either behaviour | > | is allowed. | > | > But why just those two, | | Huh? *You* just said "we need to pick tm_isdst = 0 or tm_isdst = 1". That's not the "those two" I was referring to. They were the two specific time_t results which you want to require as alternative permitted results. There are many more which would be just as rational to return in this case, assuming that there was any good reason not to just return an error, which is the sane behaviour, not just the 2 results which suit your view of how things should work. | Don't be ridiculous. Your condition stated a few lines above was | "if I want to add 5 minutes to a time_t", given that you can't just | add an appropriate number of seconds (or some other unit) to a time_t. | To achieve that you would certainly *not* set tm_isdst=-1, as that | would not be the equivalent of adding some number to a time_t. I certainly would set tm_isdst to -1, as that's what the mktime() definition says it should be (unless I know whether the time is standard or summer time, which in this case, I do not - after it has been adjusted). That's what the standard says to do. This part is (somehow, given the state of the rest of it) clear. If I was using UTC (using gmtime() and timegm() instead of localtime() and mktime()) then I would set tm_isdst=0, as there I know there is never summer time to complicate things (though setting it to -1 would work as well, as the implementation knows that as well, all UTC times are tm_isdst==0). That would be the correct way to implement arithmetic when we don't know that we can simply add to (or subtract from) a time_t variable, using local time is fraught with dangers for this purpose. I would advise against anything which suggests that it is safe or sane to attempt (nothing currently in POSIX, or as best I can tell, in C either, suggests that it is). If we were doing inventions, we'd just specify add_time() and sub_time() functions, with a time_t result and arg, and some form for additional args saying how much to add/subtract. But we're not doing that, and it seems it would be too late now anyway. (On POSIX those functions could simply convert the additional arg(s) to seconds, and just add that to the input time_t (with overflow protection of course) - on non POSIX C systems something more complex would be needed, but as long as the implementation of the functions is implementation defined, those could take advantage of whatever form the particular format of a time_t happens to be, and not need to detour through a struct tm). | It follows from the fact that mktime() is required to use the original | struct tm field values when applying the formula in XBD 4.16. That's not a fact, it makes no sense, and simply could not work. Further that section itself says as much. kre