Re: printf (the utility) expected range of integer values
Date:Mon, 26 Oct 2020 15:02:26 +0100 From:Joerg Schilling Message-ID: <5f96d6f2.jkFuBT5X4/F/wqwv%joerg.schill...@fokus.fraunhofer.de> | If the code you are using is from FreeBSD (Garret Damore) Where it originated I don't know for sure, but it has been in the NetBSD source tree since 1993, which I think means it came from a CSRG BSD distribution (the log doesn't indicate explicitly) - after whjch it has had numerous fixes an updates by various NetBSD developers over the (many) years. The code does contain a (no longer used, that is, #if 0 surrounded) sccsid from CSRG: static char sccsid[] = "@(#)printf.c8.2 (Berkeley) 3/22/95"; but that appeared when 4.4-lite2 was merged in 1997, the original in the NetBSD source tree contained static char sccsid[] = "@(#)printf.c 5.9 (Berkeley) 6/1/90"; if that's any help for you to work out the lineage and/or compare it to the FreeBSD version if that is (for some reason) relevant to anything. But none of this is the issue, our code does the same thing as bosh/bash/dash/yash for the issues in question. Those are related to what the standard actually requires, rather than what any of the current implementations actually does (except in as much as those indicate what the standard should say). The following has absolutely nothing to do with the issue I raised, but since you included it: | The code from bosh has been written from scratch to fully support | %n$ and this is what we should add to the standard in the near future. I'm not sure this is really required in the printf utility, as distinct from the printf (family of) functions, and causes all kinds of issues because of the way the utility reprocesses the format over and over until the args have all been used. Eg: consider printf '%1$d %4$d %2$d\n' 1 2 3 4 5 6 7 8 9 10 11 12 What is supposed to be printed from that? Bosh appears to print 1 4 2 5 8 6 9 12 10 Now consider some other locale (the only reason for supporting this stuff at all is when locales need to print the args in different orders or different actual args, in order to correctly represent the language conventions) where the format string that is used is '%1$d %3$d %2$d\n' With that one, bosh prints 1 3 2 4 6 5 7 9 8 10 12 11 Now it has run the format string 4 times instead of just 3 previously, and has printed all the args, whereas previously it never printed 3 7 or 11. How is this useful? For this, I'm not blaming the bosh printf implementation, given what it is being asked to do, there is very little else that it can do, though in the first example, after args 1 4 and 2 were used, it could have removed them and left 3 5 6 7 8 9 10 11 12 for the next iteration of the format string. I cannot think of a sane use for this, but at least it would end up using all of the args, and always run the format string the same number of times (presuming all formats actually consume the same number of args). The problem really is that where this method was invented, for printf() the function, the format string is only ever used once, if some of the args are never used, they're simply ignored. That can be used in a sensible way. With printf re-using the format string it really can't. One possible solution would be that if any n$ strings appear in the format string conversions, then the format is not restarted, even if all of the args are not consumed. That might make the system workable (as it then makes it equivalent to printf()). But that isn't what bosh does. kre
Re: printf (the utility) expected range of integer values
Robert Elz wrote: > Date:Mon, 26 Oct 2020 15:02:26 +0100 > From:Joerg Schilling > Message-ID: > <5f96d6f2.jkFuBT5X4/F/wqwv%joerg.schill...@fokus.fraunhofer.de> > > | If the code you are using is from FreeBSD (Garret Damore) > > Where it originated I don't know for sure, but it has been in the NetBSD > source tree since 1993, which I think means it came from a CSRG BSD > distribution (the log doesn't indicate explicitly) - after whjch it has had > numerous fixes an updates by various NetBSD developers over the (many) years. > > The code does contain a (no longer used, that is, #if 0 surrounded) > sccsid from CSRG: > static char sccsid[] = "@(#)printf.c8.2 (Berkeley) 3/22/95"; > but that appeared when 4.4-lite2 was merged in 1997, the original > in the NetBSD source tree contained > static char sccsid[] = "@(#)printf.c 5.9 (Berkeley) 6/1/90"; OK, I checked the NetBSD repoand it does not see to be related to the FreeBSD version. > The following has absolutely nothing to do with the issue I raised, but > since you included it: > > | The code from bosh has been written from scratch to fully support > | %n$ and this is what we should add to the standard in the near future. > > I'm not sure this is really required in the printf utility, as distinct > from the printf (family of) functions, and causes all kinds of issues > because of the way the utility reprocesses the format over and over until > the args have all been used. We are currently adding gettext(1) to POSIX and you need support for %n$ if you like to use gettext in a useful way in shell scripts. > Eg: consider > > printf '%1$d %4$d %2$d\n' 1 2 3 4 5 6 7 8 9 10 11 12 > > What is supposed to be printed from that? > > Bosh appears to print > > 1 4 2 > 5 8 6 > 9 12 10 This is the identical output to what you get from ksh93 and from the FreeBSD printf. What you see is a side effect from the two constraints: 1) Be compatible to the current POSIX standard 2) Support %n$ in a useful way > Now consider some other locale (the only reason for supporting this > stuff at all is when locales need to print the args in different orders > or different actual args, in order to correctly represent the language > conventions) where the format string that is used is > > '%1$d %3$d %2$d\n' > > With that one, bosh prints > > 1 3 2 > 4 6 5 > 7 9 8 > 10 12 11 > > Now it has run the format string 4 times instead of just 3 previously, > and has printed all the args, whereas previously it never printed 3 7 or 11. > > How is this useful? This is the identical output to what you get from ksh93 and from the FreeBSD printf. There is a simple rule of thumb: If you like to use %n$ for localization, use a matching number of arguments and % units with printf(1). Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: printf (the utility) expected range of integer values
Date:Mon, 26 Oct 2020 19:18:10 +0100 From:Joerg Schilling Message-ID: <5f9712e2.+zlga0iaqihkovkz%joerg.schill...@fokus.fraunhofer.de> | There is a simple rule of thumb: If you like to use %n$ for localization, | use a matching number of arguments and % units with printf(1). This is the problem, in two ways ... it isn't good enough to define something in a way where to usefully use it you also need to add a "simple rule of thumb" - much better to simply specify something that actually works, and only define it to work when used that way. Getting meaningless output from multiple implementations that copied each other (one did it first, badly, then the others copied) is not useful. Can you think of any rational use of a format with %n$ conversions where you would ever want to process the format string more than once? If not, why not just forbid it (and fix the implementations that now exist). Second, like a lot of localisation issues, all of this is too intertwined with what is needed to support European languages. It isn't always possible to follow that rule of thumb with other languages where additional elements may need to be added (or not added) to certain sentences. kre
Re: printf (the utility) expected range of integer values
Robert Elz via austin-group-l at The Open Group wrote: > I should have included dash and yash in that list - their error messages > are very similar to what /usr/bin/printf on NetBSD prints (and the NetBSD sh, > which uses the same source code for its builtin printf), but when I looked > closer, I can see they are not actually the same - so those clearly have > a builtin printf as well (they behave the same way as bash, the NetBSD sh > and bosh). If the code you are using is from FreeBSD (Garret Damore) then there are some minor bugs in it. Sorry, I no longer remember the exact problems The code from bosh has been written from scratch to fully support %n$ and this is what we should add to the standard in the near future. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: printf (the utility) expected range of integer values
Robert Elz wrote, on 24 Oct 2020: > > Is there somewhere, anywhere, where it is possible to infer what > range of values printf (the utility, not the C library function) > is expected to handle? > > I can find nothing in the XCU 3.printf page, nor in XBD 5 (and also > not in XBD 12, which would be another plausible place). You missed XBD 12.1 list item 6? > XBD 14.limits.h gives the minimum allowed value for the maximum value > of an integer (2^31 - 1) (ie: requires at least 32 bit int), but I can > find nothing that says explicitly that that applies to printf the utility. LONG_MAX is the one that applies, via XCU 1.1.2.1 Arithmetic Precision and Operations: Integer variables and constants, including the values of operands and option-arguments, used by the standard utilities listed in this volume of POSIX.1-2017 shall be implemented as equivalent to the ISO C standard signed long data type > Further, since printf (the utility) is really just converting text > strings from one format to another, there's really no reason that there > needs to be any limit at all - there's no particular reason that integers > thousands of digits long couldn't be handled. The standard does say that > if overflow occurs, an error message, and non-zero exit status, must > occur, but it doesn't ever say that overflow must occur. XCU 1.1.2.1 implies there must be an upper limit, since signed long in ISO C is a fixed-width type. However, I suppose an implementation could get around that by claiming printf works as if implemented in a C programming environment where the number of hex digits in LONG_MAX is greater than ARG_MAX, and thus a too-large value could never be passed to (an external) printf. > Second question - if overflow does occur (at whatever point) what is the > value that must be printed (in addition to the error message) from a > numeric conversion. > [...] > what the standard says is: > > If an argument operand cannot be completely converted into an internal > value appropriate to the corresponding conversion specification, a > diagnostic message shall be written to standard error and the utility > shall not exit with a zero exit status, but shall continue processing > any remaining operands and shall write the value accumulated at the > time the error was detected to standard output. > > The question is, what is "the value accumulated at the time the error was > detected". That would obviously depend on internal implementation details. In particular, digits could be processed left-to-right or right-to-left, so the unused digits could be from either end. > What zsh does is: > > zsh $ printf '%d\n' 0xc000 > zsh: number truncated after 15 digits: c000 > 1152917106560335872 > > which makes some sense to me, I had been thinking this might be the > correct value, before I started testing to see what was produced. > That is, after the first 15 hex digits are consumed, that is the value > (0xc00 in decimal) and then when an attempt is made to > add one more zero, we detect the overflow, and so the value that had > been accumulated when the overflow was detected was 1152917106560335872 > (when printed via %d). > > The value "everybody" else prints, 9223372036854775807, is simply 2^63-1 > (the max possible value) which most likely was never actually encountered > during the conversion, but is just what strtoll() returns as its value. I think that's allowed, via the usual "as if" rule. An implementor could claim that printf is implemented as if the "accumulated" value is simply incremented until converting it to a hex string produces the same digits as were supplied, or the maximum representable accumulated value is reached. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: printf (the utility) expected range of integer values
Date:Sat, 24 Oct 2020 19:22:07 + (UTC) From:shwaresyst Message-ID: <1984361807.3011984.1603567327...@mail.yahoo.com> | Could an implementor represent integers as an internal | form with 0 bits You're concentrating on a reductio-ad-absurdum comment I threw in, obviously no-one is going to do that, but if you consider that 0 bits is impossible, then try 2 bits instead, same question. But: | No, the standard requires the internal representation to be two's | complement for conforming applications; Not for the internal format used by printf when converting one string form to another string form. For that the implementation can use whatever it likes - I have considered using bignums (like bc/dc do) to allow really big values to be handled. 2's complement is irrelevant, printf (the utility) does no arithmetic on the values. All it does is print them. The rest of what you write in that paragraph is irrelevant to printf, which is a string manipulation program in reality. What the implementation provides for programs that manipulate integers has no bearing on printf which (at least as specified) doesn't do that (though the implementation might, and most likely all currently do, use a form of integer for the internal representation - none use "int" it seems though, they all use a wider variant of that. That question remains unanswered, is there a range that the printf utility is required to support, and if so, where is that specified? | The last value to be output on error, nominally, is the one before a | multiply by 10 or add of next digit causes the overflow, is how I'd | construe it. Yes, that's how I read the text, or at least that looks to be the most likely meaning. It just isn't what anyone (except zsh - and I didn't test it for decimal input, yet anyway) actually does. Since the standard is supposed to be saying what the users can expect of implementations, then it appears that the standard is wrong (given that we both read what it currently says the same way). | For a short %d, I'd expect "32769" to output "3276", There's no such thing as a short in the printf utility, but if the range of integers handled in that utility was limited to [-32786..32767] then yes, that's how I'd read it too.But assuming that there was a strtos() (akin to strtol() etc - but generating a short) that function would return 32767 for any input string 32767 or bigger (and if bigger, also set errno). If this mythical printf utility with the restricted range was using this mythical strtos() function to convert the string form to its internal representation, it would print 32767 in the case you give - the printf utility never sees the 3276 value, just the input string "32769" and the output from strtos() 32767. This is an exact analog of what most actual printf implementations seem to do, except using 64 bits instead of 16.The ksh93 implementation I can kind of understand, they clearly just ignore overflow, which means they're not using strtoll() (or similar) but probably some home-brewed conversion function How the solaris version that Alan Coopersmith told us about (in a message after yours) (and thanks for that info) gets -1 as the value to print is kind of mind boggling though. That's just weird. The gnu version Alan also included is just the same as most of the others it seems. zsh remains the only one that does what the standard seems to require to be done. kre
Re: printf (the utility) expected range of integer values
Could an implementor represent integers as an internal form with 0 bits (in which the only value that doesn't overflow is 0) and hence always print 0 for any %d (%u/%x/%d) conversion, with an error message about overflow for any value with any bits set? No, the standard requires the internal representation to be two's complement for conforming applications; other internal format use is considered unspecified behavior. While a utility may support other formats, it is implicit by default they support two's complement also for interaction with those applications. This ties into the ranges that can be expected to be output are between the *_MAX and *_MIN values from the used to compile the utility, and supposedly the implementation as a whole. If something to this effect really needs to be added it would go in XBD 2 as an implementation conformance requirement, I'd think. The last value to be output on error, nominally, is the one before a multiply by 10 or add of next digit causes the overflow, is how I'd construe it. For a short %d, I'd expect "32769" to output "3276", as the most digits capable of fitting in a 16 bit 2's comp. internal format as an actual value. On Saturday, October 24, 2020 Robert Elz wrote: Date: Sat, 24 Oct 2020 16:47:41 + (UTC) From: shwaresyst Message-ID: <160402159.2963847.1603558061...@mail.yahoo.com> | The text relevant to all this I see is the paragraph at line 104150, page 3= | 114, c181.pdf, That is the text I quoted in the previous message (I got it from 202x d1.1 but that's irrelevant, the page & line numbers have changed, but the words are the same). For reference, here it is again: If an argument operand cannot be completely converted into an internal value appropriate to the corresponding conversion specification, a diagnostic message shall be written to standard error and the utility shall not exit with a zero exit status, but shall continue processing any remaining operands and shall write the value accumulated at the time the error was detected to standard output. | which limits outputs to the internal representation range of | the format characters used, converted back to text. Yes. But what does that actually mean to someone who wants to use printf (the utility) and wants to be sure it will be able to print the numbers needed? Could an implementor represent integers as an internal form with 0 bits (in which the only value that doesn't overflow is 0) and hence always print 0 for any %d (%u/%x/%d) conversion, with an error message about overflow for any value with any bits set? If not, what text in the standard prohibits that? We know it can't happen for printf(3) (XSH.3.fprintf) as the minimum size of a C int (in POSIX) is 32 bits. But where is the required range of printf(1) (XCU.3.printf) integers stated? Surely not nowhere? | This should probably be explicit that the conversion shall detect | overflows, It is, particularly when combined with what is in the APPLICATION USAGE section. In c181 see page 3115, the paragraph that starts at line 104190: If an argument cannot be parsed correctly for the corresponding conversion specification, the printf utility is required to report an error. Thus, overflow and extraneous characters at the end of an argument being used for a numeric conversion shall be reported as errors. This part isn't a problem, or an issue, this is quite clear (and, aside from ksh93, which is obviously broken) is what everything I tested does. Now back to the questions from the original mnessage, neither of which did you even attempt to answer. Where, if anywhere, is it started what range of integers is required to be supported by printf the utility? Or in other words, is there a smallest value which is permitted to generate an overflow (for present purposes just consider positive numbers, we can all easily extrapolate to negative when appropriate.) Further, and related, is there any value which is required to be treated as overflow (perhaps related to something in rather than an absolute constant in the printf page)? And if so, where is that stated? For this, remember that printf the utility has no length modifiers for the numeric conversions (at least the integer ones, the floats aren't required at all, so obviously nothing is there to distinguish float from double, etc). That is, there is only one "kind" of integer that it is able to print, a simple %d (or %u %x %o), there is no %ld %jd %zd %lld ... And second, when an overflow does occur, and an error message is printed to stderr (and the eventual exit status from printf when it completes is set to something greater than 0) then, as required, printf is still required to print a value for the conversion that overflowed. What value should be printed - the maximum that could be handled, which is the common result (presumably because almost everyone is using strtoll() to
Re: printf (the utility) expected range of integer values
On 10/24/20 11:05 AM, Robert Elz via austin-group-l at The Open Group wrote: It might be useful to know what the printf utility (the one from the filesystem) outputs for /path/to/printf '%d\n' 0xc000 on Solaris, AIX, HPUX, Linux, MacOS, and anything else similar anyone can test that on. If you get 18446673704965373952 and no error message, then please try with more 0's appended to actually force overflow to happen. On Solaris 11.4: % /usr/bin/printf '%d\n' 0xc000 printf: 0xc000: Result too large -1 % /usr/gnu/bin/printf '%d\n' 0xc000 printf: ‘0xc000’: Result too large 9223372036854775807 (Same results on both SPARC & x86. 64-bit binaries on both.) -- -Alan Coopersmith- alan.coopersm...@oracle.com Oracle Solaris Engineering - https://blogs.oracle.com/alanc
Re: printf (the utility) expected range of integer values
A couple of messages back I wrote: | But it is obvious that at least the NetBSD sh, bash, bosh, zsh, | and ksh93 have a builtin printf (the error messages differ...) I should have included dash and yash in that list - their error messages are very similar to what /usr/bin/printf on NetBSD prints (and the NetBSD sh, which uses the same source code for its builtin printf), but when I looked closer, I can see they are not actually the same - so those clearly have a builtin printf as well (they behave the same way as bash, the NetBSD sh and bosh). It might be useful to know what the printf utility (the one from the filesystem) outputs for /path/to/printf '%d\n' 0xc000 on Solaris, AIX, HPUX, Linux, MacOS, and anything else similar anyone can test that on. If you get 18446673704965373952 and no error message, then please try with more 0's appended to actually force overflow to happen. kre ps: the FreeBSD sh may have its own builtin printf when running on FreeBSD, but the version I have either seems not to have printf built in, or built in the NetBSD printf (or happens to have something that acts identically). I could check the sources, and the way I built it, but it isn't really important.
Re: printf (the utility) expected range of integer values
Date:Sat, 24 Oct 2020 16:47:41 + (UTC) From:shwaresyst Message-ID: <160402159.2963847.1603558061...@mail.yahoo.com> | The text relevant to all this I see is the paragraph at line 104150, page 3= | 114, c181.pdf, That is the text I quoted in the previous message (I got it from 202x d1.1 but that's irrelevant, the page & line numbers have changed, but the words are the same). For reference, here it is again: If an argument operand cannot be completely converted into an internal value appropriate to the corresponding conversion specification, a diagnostic message shall be written to standard error and the utility shall not exit with a zero exit status, but shall continue processing any remaining operands and shall write the value accumulated at the time the error was detected to standard output. | which limits outputs to the internal representation range of | the format characters used, converted back to text. Yes. But what does that actually mean to someone who wants to use printf (the utility) and wants to be sure it will be able to print the numbers needed? Could an implementor represent integers as an internal form with 0 bits (in which the only value that doesn't overflow is 0) and hence always print 0 for any %d (%u/%x/%d) conversion, with an error message about overflow for any value with any bits set? If not, what text in the standard prohibits that?We know it can't happen for printf(3) (XSH.3.fprintf) as the minimum size of a C int (in POSIX) is 32 bits. But where is the required range of printf(1) (XCU.3.printf) integers stated? Surely not nowhere? | This should probably be explicit that the conversion shall detect | overflows, It is, particularly when combined with what is in the APPLICATION USAGE section. In c181 see page 3115, the paragraph that starts at line 104190: If an argument cannot be parsed correctly for the corresponding conversion specification, the printf utility is required to report an error. Thus, overflow and extraneous characters at the end of an argument being used for a numeric conversion shall be reported as errors. This part isn't a problem, or an issue, this is quite clear (and, aside from ksh93, which is obviously broken) is what everything I tested does. Now back to the questions from the original mnessage, neither of which did you even attempt to answer. Where, if anywhere, is it started what range of integers is required to be supported by printf the utility? Or in other words, is there a smallest value which is permitted to generate an overflow (for present purposes just consider positive numbers, we can all easily extrapolate to negative when appropriate.) Further, and related, is there any value which is required to be treated as overflow (perhaps related to something in rather than an absolute constant in the printf page)? And if so, where is that stated? For this, remember that printf the utility has no length modifiers for the numeric conversions (at least the integer ones, the floats aren't required at all, so obviously nothing is there to distinguish float from double, etc). That is, there is only one "kind" of integer that it is able to print, a simple %d (or %u %x %o), there is no %ld %jd %zd %lld ... And second, when an overflow does occur, and an error message is printed to stderr (and the eventual exit status from printf when it completes is set to something greater than 0) then, as required, printf is still required to print a value for the conversion that overflowed. What value should be printed - the maximum that could be handled, which is the common result (presumably because almost everyone is using strtoll() to convert the input char string to the internal representation, and that is what strtoll() is defined to return (in addition to an error indication) when it encounters overflow.But that's not what the standard actually says should be printed, what it says is something much more like what zsh does (see the previous message). I thank you for taking the time to reply, but I'd prefer it if you actually read the e-mail first, and answered the questions in it, rather than just sending random related thoughts. kre
RE: printf (the utility) expected range of integer values
The text relevant to all this I see is the paragraph at line 104150, page 3114, c181.pdf, which limits outputs to the internal representation range of the format characters used, converted back to text. This should probably be explicit that the conversion shall detect overflows, positive or negative, when converting input text, and to treat this as an error. While the C standard permits silent overflows in converting C source this makes the utility non-portable. On Saturday, October 24, 2020 Robert Elz via austin-group-l at The Open Group wrote: Is there somewhere, anywhere, where it is possible to infer what range of values printf (the utility, not the C library function) is expected to handle? I can find nothing in the XCU 3.printf page, nor in XBD 5 (and also not in XBD 12, which would be another plausible place). There doesn't seem to be anything about integers at all in XBD 3. XBD 14.limits.h gives the minimum allowed value for the maximum value of an integer (2^31 - 1) (ie: requires at least 32 bit int), but I can find nothing that says explicitly that that applies to printf the utility. Is there some expected minimum integer size for printf (the utility) that is actually specified somewhere? Further, since printf (the utility) is really just converting text strings from one format to another, there's really no reason that there needs to be any limit at all - there's no particular reason that integers thousands of digits long couldn't be handled. The standard does say that if overflow occurs, an error message, and non-zero exit status, must occur, but it doesn't ever say that overflow must occur. Second question - if overflow does occur (at whatever point) what is the value that must be printed (in addition to the error message) from a numeric conversion. Given a printf that uses 64 bit integers (which seems to be a very common choice) then what should be printed from printf '%d\n' 0xc000 ? (This is the example that made me think about all of this - we (NetBSD) have been offered a patch to make the error message go away, and the result be: -70368744177664 That is, treating the value as a bit pattern for the 64 bits, which then has the sign bit set, and so prints as a negative value. We will not be doing that. But what should we print? (In addition to the error). Every shell I tested (with 2 exceptions) does: printf '%d\n' 0xc000 -bash: printf: warning: 0xc000: Result too large or too small 9223372036854775807 That one, obviously, is from bash. Note that the "every shell" for this is not all that meaningful, many don't have printf built in, and so are simply running the NetBSD filesystem printf utility .. so it isn't then surprising that they all do the exact same thing as that does! But it is obvious that at least the NetBSD sh, bash, bosh, zsh, and ksh93 have a builtin printf (the error messages differ...) But that value might not be what the standard calls for (even though it is what almost everyone does), what the standard says is: If an argument operand cannot be completely converted into an internal value appropriate to the corresponding conversion specification, a diagnostic message shall be written to standard error and the utility shall not exit with a zero exit status, but shall continue processing any remaining operands and shall write the value accumulated at the time the error was detected to standard output. The question is, what is "the value accumulated at the time the error was detected". What zsh does is: zsh $ printf '%d\n' 0xc000 zsh: number truncated after 15 digits: c000 1152917106560335872 which makes some sense to me, I had been thinking this might be the correct value, before I started testing to see what was produced. That is, after the first 15 hex digits are consumed, that is the value (0xc00 in decimal) and then when an attempt is made to add one more zero, we detect the overflow, and so the value that had been accumulated when the overflow was detected was 1152917106560335872 (when printed via %d). The value "everybody" else prints, 9223372036854775807, is simply 2^63-1 (the max possible value) which most likely was never actually encountered during the conversion, but is just what strtoll() returns as its value. kre ps: the other shell which didn't produce 9223372036854775807 was ksh93, which actually does ksh93 $ printf '%d\n' 0xc000 -70368744177664 Sad that. Good thing that we don't use ksh as the basis of the standard!
printf (the utility) expected range of integer values
Is there somewhere, anywhere, where it is possible to infer what range of values printf (the utility, not the C library function) is expected to handle? I can find nothing in the XCU 3.printf page, nor in XBD 5 (and also not in XBD 12, which would be another plausible place). There doesn't seem to be anything about integers at all in XBD 3. XBD 14.limits.h gives the minimum allowed value for the maximum value of an integer (2^31 - 1) (ie: requires at least 32 bit int), but I can find nothing that says explicitly that that applies to printf the utility. Is there some expected minimum integer size for printf (the utility) that is actually specified somewhere? Further, since printf (the utility) is really just converting text strings from one format to another, there's really no reason that there needs to be any limit at all - there's no particular reason that integers thousands of digits long couldn't be handled. The standard does say that if overflow occurs, an error message, and non-zero exit status, must occur, but it doesn't ever say that overflow must occur. Second question - if overflow does occur (at whatever point) what is the value that must be printed (in addition to the error message) from a numeric conversion. Given a printf that uses 64 bit integers (which seems to be a very common choice) then what should be printed from printf '%d\n' 0xc000 ? (This is the example that made me think about all of this - we (NetBSD) have been offered a patch to make the error message go away, and the result be: -70368744177664 That is, treating the value as a bit pattern for the 64 bits, which then has the sign bit set, and so prints as a negative value. We will not be doing that. But what should we print? (In addition to the error). Every shell I tested (with 2 exceptions) does: printf '%d\n' 0xc000 -bash: printf: warning: 0xc000: Result too large or too small 9223372036854775807 That one, obviously, is from bash. Note that the "every shell" for this is not all that meaningful, many don't have printf built in, and so are simply running the NetBSD filesystem printf utility .. so it isn't then surprising that they all do the exact same thing as that does! But it is obvious that at least the NetBSD sh, bash, bosh, zsh, and ksh93 have a builtin printf (the error messages differ...) But that value might not be what the standard calls for (even though it is what almost everyone does), what the standard says is: If an argument operand cannot be completely converted into an internal value appropriate to the corresponding conversion specification, a diagnostic message shall be written to standard error and the utility shall not exit with a zero exit status, but shall continue processing any remaining operands and shall write the value accumulated at the time the error was detected to standard output. The question is, what is "the value accumulated at the time the error was detected". What zsh does is: zsh $ printf '%d\n' 0xc000 zsh: number truncated after 15 digits: c000 1152917106560335872 which makes some sense to me, I had been thinking this might be the correct value, before I started testing to see what was produced. That is, after the first 15 hex digits are consumed, that is the value (0xc00 in decimal) and then when an attempt is made to add one more zero, we detect the overflow, and so the value that had been accumulated when the overflow was detected was 1152917106560335872 (when printed via %d). The value "everybody" else prints, 9223372036854775807, is simply 2^63-1 (the max possible value) which most likely was never actually encountered during the conversion, but is just what strtoll() returns as its value. kre ps: the other shell which didn't produce 9223372036854775807 was ksh93, which actually does ksh93 $ printf '%d\n' 0xc000 -70368744177664 Sad that. Good thing that we don't use ksh as the basis of the standard!