Re: printf (the utility) expected range of integer values

2020-10-26 Thread Robert Elz via austin-group-l at The Open Group
Date:Mon, 26 Oct 2020 15:02:26 +0100
From:Joerg Schilling 
Message-ID:  <5f96d6f2.jkFuBT5X4/F/wqwv%joerg.schill...@fokus.fraunhofer.de>

  | If the code you are using is from FreeBSD (Garret Damore)

Where it originated I don't know for sure, but it has been in the NetBSD
source tree since 1993, which I think means it came from a CSRG BSD
distribution (the log doesn't indicate explicitly) - after whjch it has had
numerous fixes an updates by various NetBSD developers over the (many) years.

The code does contain a (no longer used, that is, #if 0 surrounded)
sccsid from CSRG:
static char sccsid[] = "@(#)printf.c8.2 (Berkeley) 3/22/95";
but that appeared when 4.4-lite2 was merged in 1997, the original
in the NetBSD source tree contained
static char sccsid[] = "@(#)printf.c   5.9 (Berkeley) 6/1/90";

if that's any help for you to work out the lineage and/or compare
it to the FreeBSD version if that is (for some reason) relevant to anything.


But none of this is the issue, our code does the same thing as
bosh/bash/dash/yash for the issues in question.


Those are related to what the standard actually requires, rather than
what any of the current implementations actually does (except in as
much as those indicate what the standard should say).

The following has absolutely nothing to do with the issue I raised, but
since you included it:

  | The code from bosh has been written from scratch to fully support
  | %n$ and this is what we should add to the standard in the near future.

I'm not sure this is really required in the printf utility, as distinct
from the printf (family of) functions, and causes all kinds of issues
because of the way the utility reprocesses the format over and over until
the args have all been used.

Eg: consider

printf '%1$d %4$d %2$d\n' 1 2 3 4 5 6 7 8 9 10 11 12

What is supposed to be printed from that?

Bosh appears to print

1 4 2
5 8 6
9 12 10

Now consider some other locale (the only reason for supporting this
stuff at all is when locales need to print the args in different orders
or different actual args, in order to correctly represent the language
conventions) where the format string that is used is

'%1$d %3$d %2$d\n'

With that one, bosh prints

1 3 2
4 6 5
7 9 8
10 12 11

Now it has run the format string 4 times instead of just 3 previously,
and has printed all the args, whereas previously it never printed 3 7 or 11.

How is this useful?

For this, I'm not blaming the bosh printf implementation, given what it
is being asked to do, there is very little else that it can do, though
in the first example, after args 1 4 and 2 were used, it could have removed
them and left 3 5 6 7 8 9 10 11 12 for the next iteration of the format
string.   I cannot think of a sane use for this, but at least it would end
up using all of the args, and always run the format string the same number
of times (presuming all formats actually consume the same number of args).

The problem really is that where this method was invented, for printf()
the function, the format string is only ever used once, if some of the
args are never used, they're simply ignored.   That can be used in a
sensible way.   With printf re-using the format string it really can't.

One possible solution would be that if any n$ strings appear in the
format string conversions, then the format is not restarted, even if
all of the args are not consumed.   That might make the system workable
(as it then makes it equivalent to printf()).   But that isn't what bosh does.

kre




Re: printf (the utility) expected range of integer values

2020-10-26 Thread Joerg Schilling via austin-group-l at The Open Group
Robert Elz  wrote:

> Date:Mon, 26 Oct 2020 15:02:26 +0100
> From:Joerg Schilling 
> Message-ID:  
> <5f96d6f2.jkFuBT5X4/F/wqwv%joerg.schill...@fokus.fraunhofer.de>
>
>   | If the code you are using is from FreeBSD (Garret Damore)
>
> Where it originated I don't know for sure, but it has been in the NetBSD
> source tree since 1993, which I think means it came from a CSRG BSD
> distribution (the log doesn't indicate explicitly) - after whjch it has had
> numerous fixes an updates by various NetBSD developers over the (many) years.
>
> The code does contain a (no longer used, that is, #if 0 surrounded)
> sccsid from CSRG:
>   static char sccsid[] = "@(#)printf.c8.2 (Berkeley) 3/22/95";
> but that appeared when 4.4-lite2 was merged in 1997, the original
> in the NetBSD source tree contained
>   static char sccsid[] = "@(#)printf.c   5.9 (Berkeley) 6/1/90";

OK, I checked the NetBSD repoand it does not see to be related to the FreeBSD 
version.


> The following has absolutely nothing to do with the issue I raised, but
> since you included it:
>
>   | The code from bosh has been written from scratch to fully support
>   | %n$ and this is what we should add to the standard in the near future.
>
> I'm not sure this is really required in the printf utility, as distinct
> from the printf (family of) functions, and causes all kinds of issues
> because of the way the utility reprocesses the format over and over until
> the args have all been used.

We are currently adding gettext(1) to POSIX and you need support for %n$
if you like to use gettext in a useful way in shell scripts.

> Eg: consider
>
>   printf '%1$d %4$d %2$d\n' 1 2 3 4 5 6 7 8 9 10 11 12
>
> What is supposed to be printed from that?
>
> Bosh appears to print
>
>   1 4 2
>   5 8 6
>   9 12 10

This is the identical output to what you get from ksh93 and from the FreeBSD 
printf. What you see is a side effect from the two constraints:

1)  Be compatible to the current POSIX standard

2)  Support %n$ in a useful way

> Now consider some other locale (the only reason for supporting this
> stuff at all is when locales need to print the args in different orders
> or different actual args, in order to correctly represent the language
> conventions) where the format string that is used is
>
>   '%1$d %3$d %2$d\n'
>
> With that one, bosh prints
>
>   1 3 2
>   4 6 5
>   7 9 8
>   10 12 11
>
> Now it has run the format string 4 times instead of just 3 previously,
> and has printed all the args, whereas previously it never printed 3 7 or 11.
>
> How is this useful?

This is the identical output to what you get from ksh93 and from the FreeBSD 
printf. 

There is a simple rule of thumb: If you like to use %n$ for localization, use 
a matching number of arguments and % units with printf(1).

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
  Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: printf (the utility) expected range of integer values

2020-10-26 Thread Robert Elz via austin-group-l at The Open Group
Date:Mon, 26 Oct 2020 19:18:10 +0100
From:Joerg Schilling 
Message-ID:  <5f9712e2.+zlga0iaqihkovkz%joerg.schill...@fokus.fraunhofer.de>

  | There is a simple rule of thumb: If you like to use %n$ for localization,
  | use a matching number of arguments and % units with printf(1).

This is the problem, in two ways ... it isn't good enough to define
something in a way where to usefully use it you also need to add a
"simple rule of thumb" - much better to simply specify something that
actually works, and only define it to work when used that way.

Getting meaningless output from multiple implementations that copied
each other (one did it first, badly, then the others copied) is not useful.

Can you think of any rational use of a format with %n$ conversions where
you would ever want to process the format string more than once?   If
not, why not just forbid it (and fix the implementations that now exist).

Second, like a lot of localisation issues, all of this is too intertwined
with what is needed to support European languages.   It isn't always possible
to follow that rule of thumb with other languages where additional elements
may need to be added (or not added) to certain sentences.

kre




Re: printf (the utility) expected range of integer values

2020-10-26 Thread Joerg Schilling via austin-group-l at The Open Group
Robert Elz via austin-group-l at The Open Group  
wrote:

> I should have included dash and yash in that list - their error messages
> are very similar to what /usr/bin/printf on NetBSD prints (and the NetBSD sh,
> which uses the same source code for its builtin printf), but when I looked
> closer, I can see they are not actually the same - so those clearly have
> a builtin printf as well (they behave the same way as bash, the NetBSD sh
> and bosh).

If the code you are using is from FreeBSD (Garret Damore) then there are some 
minor bugs in it. Sorry, I no longer remember the exact problems

The code from bosh has been written from scratch to fully support %n$ and this 
is what we should add to the standard in the near future.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
  Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: printf (the utility) expected range of integer values

2020-10-26 Thread Geoff Clare via austin-group-l at The Open Group
Robert Elz wrote, on 24 Oct 2020:
>
> Is there somewhere, anywhere, where it is possible to infer what
> range of values printf (the utility, not the C library function)
> is expected to handle?
> 
> I can find nothing in the XCU 3.printf page, nor in XBD 5 (and also
> not in XBD 12, which would be another plausible place).

You missed XBD 12.1 list item 6?

> XBD 14.limits.h gives the minimum allowed value for the maximum value
> of an integer (2^31 - 1) (ie: requires at least 32 bit int), but I can
> find nothing that says explicitly that that applies to printf the utility.

LONG_MAX is the one that applies, via XCU 1.1.2.1 Arithmetic Precision
and Operations:

Integer variables and constants, including the values of operands
and option-arguments, used by the standard utilities listed in
this volume of POSIX.1-2017 shall be implemented as equivalent to
the ISO C standard signed long data type

> Further, since printf (the utility) is really just converting text
> strings from one format to another, there's really no reason that there
> needs to be any limit at all - there's no particular reason that integers
> thousands of digits long couldn't be handled.   The standard does say that
> if overflow occurs, an error message, and non-zero exit status, must
> occur, but it doesn't ever say that overflow must occur.

XCU 1.1.2.1 implies there must be an upper limit, since signed long in
ISO C is a fixed-width type. However, I suppose an implementation could
get around that by claiming printf works as if implemented in a C
programming environment where the number of hex digits in LONG_MAX is
greater than ARG_MAX, and thus a too-large value could never be passed
to (an external) printf.

> Second question - if overflow does occur (at whatever point) what is the
> value that must be printed (in addition to the error message) from a
> numeric conversion.
> 
[...]
> what the standard says is:
> 
>  If an argument operand cannot be completely converted into an internal
>  value appropriate to the corresponding conversion specification, a
>  diagnostic message shall be written to standard error and the utility
>  shall not exit with a zero exit status, but shall continue processing
>  any remaining operands and shall write the value accumulated at the
>  time the error was detected to standard output.
> 
> The question is, what is "the value accumulated at the time the error was
> detected".

That would obviously depend on internal implementation details. In
particular, digits could be processed left-to-right or right-to-left,
so the unused digits could be from either end.

> What zsh does is:
> 
>   zsh $ printf '%d\n' 0xc000
>   zsh: number truncated after 15 digits: c000
>   1152917106560335872
> 
> which makes some sense to me, I had been thinking this might be the
> correct value, before I started testing to see what was produced.
> That is, after the first 15 hex digits are consumed, that is the value
> (0xc00 in decimal) and then when an attempt is made to
> add one more zero, we detect the overflow, and so the value that had
> been accumulated when the overflow was detected was 1152917106560335872
> (when printed via %d).
> 
> The value "everybody" else prints, 9223372036854775807, is simply 2^63-1
> (the max possible value) which most likely was never actually encountered
> during the conversion, but is just what strtoll() returns as its value.

I think that's allowed, via the usual "as if" rule.  An implementor
could claim that printf is implemented as if the "accumulated" value
is simply incremented until converting it to a hex string produces the
same digits as were supplied, or the maximum representable accumulated
value is reached.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: printf (the utility) expected range of integer values

2020-10-24 Thread Robert Elz via austin-group-l at The Open Group
Date:Sat, 24 Oct 2020 19:22:07 + (UTC)
From:shwaresyst 
Message-ID:  <1984361807.3011984.1603567327...@mail.yahoo.com>

  | Could an implementor represent integers as an internal
  | form with 0 bits

You're concentrating on a reductio-ad-absurdum comment I threw in,
obviously no-one is going to do that, but if you consider that 0 bits
is impossible, then try 2 bits instead, same question.

But:

  | No, the standard requires the internal representation to be two's
  | complement for conforming applications;

Not for the internal format used by printf when converting one string
form to another string form.   For that the implementation can use whatever
it likes - I have considered using bignums (like bc/dc do) to allow really
big values to be handled.   2's complement is irrelevant, printf (the utility)
does no arithmetic on the values.   All it does is print them.

The rest of what you write in that paragraph is irrelevant to printf,
which is a string manipulation program in reality.   What the implementation
provides for programs that manipulate integers has no bearing on printf
which (at least as specified) doesn't do that (though the implementation
might, and most likely all currently do, use a form of integer for the
internal representation - none use "int" it seems though, they all use
a wider variant of that.

That question remains unanswered, is there a range that the printf utility
is required to support, and if so, where is that specified?

  | The last value to be output on error, nominally, is the one before a
  | multiply by 10 or add of next digit causes the overflow, is how I'd
  | construe it.

Yes, that's how I read the text, or at least that looks to be the most
likely meaning.   It just isn't what anyone (except zsh - and I didn't
test it for decimal input, yet anyway) actually does.   Since the standard
is supposed to be saying what the users can expect of implementations, then
it appears that the standard is wrong (given that we both read what it
currently says the same way).

  | For a short %d, I'd expect "32769" to output "3276",

There's no such thing as a short in the printf utility, but if the
range of integers handled in that utility was limited to [-32786..32767]
then yes, that's how I'd read it too.But assuming that there was a
strtos() (akin to strtol() etc - but generating a short) that function
would return 32767 for any input string 32767 or bigger (and if bigger,
also set errno).   If this mythical printf utility with the restricted
range was using this mythical strtos() function to convert the string
form to its internal representation, it would print 32767 in the case
you give - the printf utility never sees the 3276 value, just the input
string "32769" and the output from strtos() 32767.

This is an exact analog of what most actual printf implementations seem to
do, except using 64 bits instead of 16.The ksh93 implementation I can
kind of understand, they clearly just ignore overflow, which means they're
not using strtoll() (or similar) but probably some home-brewed conversion
function

How the solaris version that Alan Coopersmith told us about (in a message
after yours) (and thanks for that info) gets -1 as the value to print is
kind of mind boggling though.   That's just weird.   The gnu version Alan
also included is just the same as most of the others it seems.   zsh remains
the only one that does what the standard seems to require to be done.

kre



Re: printf (the utility) expected range of integer values

2020-10-24 Thread shwaresyst via austin-group-l at The Open Group

Could an implementor represent integers as an internal
form with 0 bits (in which the only value that doesn't overflow is 0)
and hence always print 0 for any %d (%u/%x/%d) conversion, with an error
message about overflow for any value with any bits set?

No, the standard requires the internal representation to be two's complement 
for conforming applications; other internal format use is considered 
unspecified behavior. While a utility may support other formats, it is implicit 
by default they support two's complement also for interaction with those 
applications. This ties into the ranges that can be expected to be output are 
between the *_MAX and *_MIN values from the  used to compile the 
utility, and supposedly the implementation as a whole. If something to this 
effect really needs to be added it would go in XBD 2 as an implementation 
conformance requirement, I'd think. 

The last value to be output on error, nominally, is the one before a multiply 
by 10 or add of next digit causes the overflow, is how I'd construe it. For a 
short %d, I'd expect "32769" to output "3276", as the most digits capable of 
fitting in a 16 bit 2's comp. internal format as an actual value.
On Saturday, October 24, 2020 Robert Elz  wrote:
    Date:        Sat, 24 Oct 2020 16:47:41 + (UTC)
    From:        shwaresyst 
    Message-ID:  <160402159.2963847.1603558061...@mail.yahoo.com>

  | The text relevant to all this I see is the paragraph at line 104150, page 3=
  | 114, c181.pdf,

That is the text I quoted in the previous message (I got it from 202x d1.1
but that's irrelevant, the page & line numbers have changed, but the words
are the same).  For reference, here it is again:

    If an argument operand cannot be completely converted into an internal
    value appropriate to the corresponding conversion specification, a
    diagnostic message shall be written to standard error and the utility
    shall not exit with a zero exit status, but shall continue processing
    any remaining operands and shall write the value accumulated at the
    time the error was detected to standard output.

  | which limits outputs to the internal representation range of
  | the format characters used, converted back to text.

Yes.  But what does that actually mean to someone who wants to use
printf (the utility) and wants to be sure it will be able to print the
numbers needed?  Could an implementor represent integers as an internal
form with 0 bits (in which the only value that doesn't overflow is 0)
and hence always print 0 for any %d (%u/%x/%d) conversion, with an error
message about overflow for any value with any bits set?

If not, what text in the standard prohibits that?    We know it can't happen
for printf(3) (XSH.3.fprintf) as the minimum size of a C int (in POSIX)
is 32 bits.  But where is the required range of printf(1) (XCU.3.printf)
integers stated?  Surely not nowhere?

  | This should probably be explicit that the conversion shall detect
  | overflows,

It is, particularly when combined with what is in the APPLICATION USAGE
section.  In c181 see page 3115, the paragraph that starts at line 104190:

    If an argument cannot be parsed correctly for the corresponding
    conversion specification, the printf utility is required to report
    an error. Thus, overflow and extraneous characters at the end
    of an argument being used for a numeric conversion shall be reported
    as errors.

This part isn't a problem, or an issue, this is quite clear (and, aside
from ksh93, which is obviously broken) is what everything I tested does.

Now back to the questions from the original mnessage, neither of which did
you even attempt to answer.

Where, if anywhere, is it started what range of integers is required to be
supported by printf the utility?  Or in other words, is there a smallest
value which is permitted to generate an overflow (for present purposes just
consider positive numbers, we can all easily extrapolate to negative when
appropriate.)  Further, and related, is there any value which is required
to be treated as overflow (perhaps related to something in  rather
than an absolute constant in the printf page)?  And if so, where is that
stated?

For this, remember that printf the utility has no length modifiers for the
numeric conversions (at least the integer ones, the floats aren't required
at all, so obviously nothing is there to distinguish float from double, etc).
That is, there is only one "kind" of integer that it is able to print, a
simple %d (or %u %x %o), there is no %ld %jd %zd %lld ...

And second, when an overflow does occur, and an error message is printed to
stderr (and the eventual exit status from printf when it completes is set to
something greater than 0) then, as required, printf is still required to
print a value for the conversion that overflowed.  What value should be
printed - the maximum that could be handled, which is the common result
(presumably because almost everyone is using strtoll() to 

Re: printf (the utility) expected range of integer values

2020-10-24 Thread Alan Coopersmith via austin-group-l at The Open Group

On 10/24/20 11:05 AM, Robert Elz via austin-group-l at The Open Group wrote:

It might be useful to know what the printf utility (the one
from the filesystem) outputs for
/path/to/printf '%d\n' 0xc000
on Solaris, AIX, HPUX, Linux, MacOS, and anything else
similar anyone can test that on.   If you get
18446673704965373952
and no error message, then please try with more 0's appended
to actually force overflow to happen.


On Solaris 11.4:

% /usr/bin/printf '%d\n' 0xc000
printf: 0xc000: Result too large
-1

% /usr/gnu/bin/printf '%d\n' 0xc000
printf: ‘0xc000’: Result too large
9223372036854775807

(Same results on both SPARC & x86. 64-bit binaries on both.)

--
-Alan Coopersmith-   alan.coopersm...@oracle.com
 Oracle Solaris Engineering - https://blogs.oracle.com/alanc



Re: printf (the utility) expected range of integer values

2020-10-24 Thread Robert Elz via austin-group-l at The Open Group


A couple of messages back I wrote:

  | But it is obvious that at least the NetBSD sh, bash, bosh, zsh,
  | and ksh93 have a builtin printf (the error messages differ...)

I should have included dash and yash in that list - their error messages
are very similar to what /usr/bin/printf on NetBSD prints (and the NetBSD sh,
which uses the same source code for its builtin printf), but when I looked
closer, I can see they are not actually the same - so those clearly have
a builtin printf as well (they behave the same way as bash, the NetBSD sh
and bosh).

It might be useful to know what the printf utility (the one
from the filesystem) outputs for
/path/to/printf '%d\n' 0xc000
on Solaris, AIX, HPUX, Linux, MacOS, and anything else
similar anyone can test that on.   If you get
18446673704965373952
and no error message, then please try with more 0's appended
to actually force overflow to happen.

kre

ps: the FreeBSD sh may have its own builtin printf when running on FreeBSD,
but the version I have either seems not to have printf built in, or built
in the NetBSD printf (or happens to have something that acts identically).
I could check the sources, and the way I built it, but it isn't really 
important.



Re: printf (the utility) expected range of integer values

2020-10-24 Thread Robert Elz via austin-group-l at The Open Group
Date:Sat, 24 Oct 2020 16:47:41 + (UTC)
From:shwaresyst 
Message-ID:  <160402159.2963847.1603558061...@mail.yahoo.com>

  | The text relevant to all this I see is the paragraph at line 104150, page 3=
  | 114, c181.pdf,

That is the text I quoted in the previous message (I got it from 202x d1.1
but that's irrelevant, the page & line numbers have changed, but the words
are the same).  For reference, here it is again:

 If an argument operand cannot be completely converted into an internal
 value appropriate to the corresponding conversion specification, a
 diagnostic message shall be written to standard error and the utility
 shall not exit with a zero exit status, but shall continue processing
 any remaining operands and shall write the value accumulated at the
 time the error was detected to standard output.

  | which limits outputs to the internal representation range of
  | the format characters used, converted back to text.

Yes.   But what does that actually mean to someone who wants to use
printf (the utility) and wants to be sure it will be able to print the
numbers needed?   Could an implementor represent integers as an internal
form with 0 bits (in which the only value that doesn't overflow is 0)
and hence always print 0 for any %d (%u/%x/%d) conversion, with an error
message about overflow for any value with any bits set?

If not, what text in the standard prohibits that?We know it can't happen
for printf(3) (XSH.3.fprintf) as the minimum size of a C int (in POSIX)
is 32 bits.   But where is the required range of printf(1) (XCU.3.printf)
integers stated?   Surely not nowhere?

  | This should probably be explicit that the conversion shall detect
  | overflows,

It is, particularly when combined with what is in the APPLICATION USAGE
section.  In c181 see page 3115, the paragraph that starts at line 104190:

If an argument cannot be parsed correctly for the corresponding
conversion specification, the printf utility is required to report
an error. Thus, overflow and extraneous characters at the end
of an argument being used for a numeric conversion shall be reported
as errors.

This part isn't a problem, or an issue, this is quite clear (and, aside
from ksh93, which is obviously broken) is what everything I tested does.

Now back to the questions from the original mnessage, neither of which did
you even attempt to answer.

Where, if anywhere, is it started what range of integers is required to be
supported by printf the utility?  Or in other words, is there a smallest
value which is permitted to generate an overflow (for present purposes just
consider positive numbers, we can all easily extrapolate to negative when
appropriate.)   Further, and related, is there any value which is required
to be treated as overflow (perhaps related to something in  rather
than an absolute constant in the printf page)?  And if so, where is that
stated?

For this, remember that printf the utility has no length modifiers for the
numeric conversions (at least the integer ones, the floats aren't required
at all, so obviously nothing is there to distinguish float from double, etc).
That is, there is only one "kind" of integer that it is able to print, a
simple %d (or %u %x %o), there is no %ld %jd %zd %lld ...

And second, when an overflow does occur, and an error message is printed to
stderr (and the eventual exit status from printf when it completes is set to
something greater than 0) then, as required, printf is still required to
print a value for the conversion that overflowed.   What value should be
printed - the maximum that could be handled, which is the common result
(presumably because almost everyone is using strtoll() to convert the
input char string to the internal representation, and that is what strtoll()
is defined to return (in addition to an error indication) when it encounters
overflow.But that's not what the standard actually says should be
printed, what it says is something much more like what zsh does (see the
previous message).

I thank you for taking the time to reply, but I'd prefer it if you actually
read the e-mail first, and answered the questions in it, rather than just
sending random related thoughts.

kre



RE: printf (the utility) expected range of integer values

2020-10-24 Thread shwaresyst via austin-group-l at The Open Group

The text relevant to all this I see is the paragraph at line 104150, page 3114, 
c181.pdf, which limits outputs to the internal representation range of the 
format characters used, converted back to text. This should probably be 
explicit that the conversion shall detect overflows, positive or negative, when 
converting input text, and to treat this as an error. While the C standard 
permits silent overflows in converting C source this makes the utility 
non-portable.
On Saturday, October 24, 2020 Robert Elz via austin-group-l at The Open Group 
 wrote:
Is there somewhere, anywhere, where it is possible to infer what
range of values printf (the utility, not the C library function)
is expected to handle?

I can find nothing in the XCU 3.printf page, nor in XBD 5 (and also
not in XBD 12, which would be another plausible place).  There doesn't
seem to be anything about integers at all in XBD 3.

XBD 14.limits.h gives the minimum allowed value for the maximum value
of an integer (2^31 - 1) (ie: requires at least 32 bit int), but I can
find nothing that says explicitly that that applies to printf the utility.

Is there some expected minimum integer size for printf (the utility)
that is actually specified somewhere?

Further, since printf (the utility) is really just converting text
strings from one format to another, there's really no reason that there
needs to be any limit at all - there's no particular reason that integers
thousands of digits long couldn't be handled.  The standard does say that
if overflow occurs, an error message, and non-zero exit status, must
occur, but it doesn't ever say that overflow must occur.

Second question - if overflow does occur (at whatever point) what is the
value that must be printed (in addition to the error message) from a
numeric conversion.

Given a printf that uses 64 bit integers (which seems to be a very common
choice) then what should be printed from

    printf '%d\n' 0xc000

?

(This is the example that made me think about all of this - we (NetBSD)
have been offered a patch to make the error message go away, and the
result be:
    -70368744177664
That is, treating the value as a bit pattern for the 64 bits, which then
has the sign bit set, and so prints as a negative value.

We will not be doing that.

But what should we print?  (In addition to the error).

Every shell I tested (with 2 exceptions) does:

printf '%d\n' 0xc000
-bash: printf: warning: 0xc000: Result too large or too small
9223372036854775807

That one, obviously, is from bash.  Note that the "every shell" for this
is not all that meaningful, many don't have printf built in, and so are
simply running the NetBSD filesystem printf utility .. so it isn't then
surprising that they all do the exact same thing as that does!  But it
is obvious that at least the NetBSD sh, bash, bosh, zsh, and ksh93 have
a builtin printf (the error messages differ...)

But that value might not be what the standard calls for (even though it
is what almost everyone does), what the standard says is:

    If an argument operand cannot be completely converted into an internal
    value appropriate to the corresponding conversion specification, a
    diagnostic message shall be written to standard error and the utility
    shall not exit with a zero exit status, but shall continue processing
    any remaining operands and shall write the value accumulated at the
    time the error was detected to standard output.

The question is, what is "the value accumulated at the time the error was
detected".

What zsh does is:

    zsh $ printf '%d\n' 0xc000
    zsh: number truncated after 15 digits: c000
    1152917106560335872

which makes some sense to me, I had been thinking this might be the
correct value, before I started testing to see what was produced.
That is, after the first 15 hex digits are consumed, that is the value
(0xc00 in decimal) and then when an attempt is made to
add one more zero, we detect the overflow, and so the value that had
been accumulated when the overflow was detected was 1152917106560335872
(when printed via %d).

The value "everybody" else prints, 9223372036854775807, is simply 2^63-1
(the max possible value) which most likely was never actually encountered
during the conversion, but is just what strtoll() returns as its value.

kre

ps: the other shell which didn't produce 9223372036854775807 was ksh93,
which actually does
    ksh93 $ printf '%d\n' 0xc000
    -70368744177664
Sad that.  Good thing that we don't use ksh as the basis of the standard!




printf (the utility) expected range of integer values

2020-10-24 Thread Robert Elz via austin-group-l at The Open Group
Is there somewhere, anywhere, where it is possible to infer what
range of values printf (the utility, not the C library function)
is expected to handle?

I can find nothing in the XCU 3.printf page, nor in XBD 5 (and also
not in XBD 12, which would be another plausible place).   There doesn't
seem to be anything about integers at all in XBD 3.

XBD 14.limits.h gives the minimum allowed value for the maximum value
of an integer (2^31 - 1) (ie: requires at least 32 bit int), but I can
find nothing that says explicitly that that applies to printf the utility.

Is there some expected minimum integer size for printf (the utility)
that is actually specified somewhere?

Further, since printf (the utility) is really just converting text
strings from one format to another, there's really no reason that there
needs to be any limit at all - there's no particular reason that integers
thousands of digits long couldn't be handled.   The standard does say that
if overflow occurs, an error message, and non-zero exit status, must
occur, but it doesn't ever say that overflow must occur.

Second question - if overflow does occur (at whatever point) what is the
value that must be printed (in addition to the error message) from a
numeric conversion.

Given a printf that uses 64 bit integers (which seems to be a very common
choice) then what should be printed from

printf '%d\n' 0xc000

?

(This is the example that made me think about all of this - we (NetBSD)
have been offered a patch to make the error message go away, and the
result be:
-70368744177664
That is, treating the value as a bit pattern for the 64 bits, which then
has the sign bit set, and so prints as a negative value.

We will not be doing that.

But what should we print?  (In addition to the error).

Every shell I tested (with 2 exceptions) does:

printf '%d\n' 0xc000
-bash: printf: warning: 0xc000: Result too large or too small
9223372036854775807

That one, obviously, is from bash.   Note that the "every shell" for this
is not all that meaningful, many don't have printf built in, and so are
simply running the NetBSD filesystem printf utility .. so it isn't then
surprising that they all do the exact same thing as that does!   But it
is obvious that at least the NetBSD sh, bash, bosh, zsh, and ksh93 have
a builtin printf (the error messages differ...)

But that value might not be what the standard calls for (even though it
is what almost everyone does), what the standard says is:

 If an argument operand cannot be completely converted into an internal
 value appropriate to the corresponding conversion specification, a
 diagnostic message shall be written to standard error and the utility
 shall not exit with a zero exit status, but shall continue processing
 any remaining operands and shall write the value accumulated at the
 time the error was detected to standard output.

The question is, what is "the value accumulated at the time the error was
detected".

What zsh does is:

zsh $ printf '%d\n' 0xc000
zsh: number truncated after 15 digits: c000
1152917106560335872

which makes some sense to me, I had been thinking this might be the
correct value, before I started testing to see what was produced.
That is, after the first 15 hex digits are consumed, that is the value
(0xc00 in decimal) and then when an attempt is made to
add one more zero, we detect the overflow, and so the value that had
been accumulated when the overflow was detected was 1152917106560335872
(when printed via %d).

The value "everybody" else prints, 9223372036854775807, is simply 2^63-1
(the max possible value) which most likely was never actually encountered
during the conversion, but is just what strtoll() returns as its value.

kre

ps: the other shell which didn't produce 9223372036854775807 was ksh93,
which actually does
ksh93 $ printf '%d\n' 0xc000
-70368744177664
Sad that.   Good thing that we don't use ksh as the basis of the standard!