Re: [Python-Dev] Two proposed changes to float formatting
ark Dickinson wrote: > On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels wrote: >... >> I had also said (without explaining: only the trailing zeroes with the e, so we wind up with: 1157920892373161954235709850086879078532699846656405640e+23 or 115792089237316195423570985008687907853269984665640564.0e+24 or some such, rather than 1.157920892373162e+77 or 1.15792089237316195423570985008687907853269984665640564e+77 >> These are all possible representations for 2 ** 256. > > Understood. > >> _but_ the printed decimal number I am proposing is within one ULP of >> the value of the binary numbery. > > But there are plenty of ways to get this if this is what you want: if > you want a displayed result that's within 1 ulp (or 0.5 ulps, which > would be better) of the true value then repr should serve your needs. The representation I am suggesting here is a half-way measure between your proposal and the existing behvior. This representation addresses the abrupt transition that you point out (number of significant digits drops precipitously) without particularly changing the goal of the transition (displaying faux accuracy), without, in my (possibly naive) view, seriously complicating either the print-generating code or the issues for the reader of the output. To wit, the proposal is (A) for numbers where the printed digits exceed the accuracy presented, represent the result as an integer with an e+N, rather than a number between 1 and 2-epsilon with an exponent that makes you have to count digits to compare the two values, and (B) that the full precision available in the the value be shown in the representation. Given that everyone understands that is what I am proposing, I am OK with the decision going where it will. I am comforted that we are only talking about about four wrapped lines if we go to the full integer, which I had not realized. Further, I agree with you that there is an abrupt transition in represented accuracy as we cross from %f to %g, that should be somehow addressed. You want to address it by continuing to show digits, and I want to limit the digits shown to a value that reflects the known accuracy. I also want text that compares "smoothly" with numbers near the transition (so that greater-than and less-than relationships are obvious without thinking, hence the representation that avoids the "normalized" mantissa. . Having said all this, I think my compromise position should be clear. I did not mean to argue with you, but rather intended to propose a possible middle way that some might find appealing. --Scott David Daniels scott.dani...@acm.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Mark Dickinson wrote: (1) Currently, '%f' formatting automatically changes to '%g' formatting for numbers larger than 1e50. For example: '%f' % 2**166. '93536104789177786765035829293842113257979682750464.00' '%f' % 2**167. '1.87072e+50' I propose removing this feature for 3.1 I don't think we've stated it on this discussion, but I know from private email with Mark that his proposal is for both %-formatting and for float.__format__ to have this change. I just want to get it on the record here. Eric. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
it would make more sense to me if it stayed fixed at 56 sig digits for numbers larger than 1e50. So I agree with this, even if the default # of sig digits were less. Several reasons to accept Mark's proposal: * It matches what C does and many languages tend to copy the C standards with respect to format codes. Matching other languages helps in porting code, copying algorithms, and mentally switching back and forth when working in multiple languages. * When a programmer has chosen %f, that means that they have consciously rejected choosing %e or %g. It is generally best to have the code do what the programmer asked for ;-) * Code that tested well with 1e47, 1e48, 1e49, and 1e50 suddenly shifts behavior with 1e51. Behavior shifts like that are bug bait. * The 56 significant digits may be rooted in the longest decimal expansion of a 53 bit float. For example, len(str(Decimal.from_float(.1))) is 57 including the leading zero. But not all machines (now, in the past, or in the future) use 53 bits for the significand. * Use of exponents is common but not universal. Some converters for SQL specs like Decimal(10,80) may not recognize the e-notation. The xmlrpc spec only accepts decimal expansions not %e notation. * The programmer needs to have some way to spell-out a decimal expansion when needed. Currently, %f is the only way. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels wrote: > I had also said (without explaining: >> > only the trailing zeroes with the e, so we wind up with: >> > 1157920892373161954235709850086879078532699846656405640e+23 >> > or 115792089237316195423570985008687907853269984665640564.0e+24 >> > or some such, rather than >> > 1.157920892373162e+77 >> > or 1.15792089237316195423570985008687907853269984665640564e+77 > These are all possible representations for 2 ** 256. Understood. > _but_ the printed decimal number I am proposing is within one ULP of > the value of the binary numbery. But there are plenty of ways to get this if this is what you want: if you want a displayed result that's within 1 ulp (or 0.5 ulps, which would be better) of the true value then repr should serve your needs. If you want more control over the number of significant digits then '%g' formatting gives that, together with a nice-looking output for small numbers. It's only '%f' formatting that I'm proposing changing: I see a '%.2f' formatting request as a very specific, precise one: give me exactly 2 digits after the point---no more, no less, and it seems wrong and arbitrary that this request should be ignored for numbers larger than 1e50 in absolute value. That is, for general float formatting needs, use %g, str and repr. %e and %f are for when you want fine control. > That is, the majority of the digits > in int(1e308) are a fiction Not really: the float that Python stores has a very specific value, and the '%f' formatting is showing exactly that value. (Yes, I know that some people advocate viewing a float as a range of values rather than a specific value; but I'm pretty sure that that's not the way that the creators of IEEE 754 were thinking.) > zeros get taken off the representation. The reason I don't care is > that the code from getting a floating point value is tricky, and I > suspect the printing code might not easily be able to distinguish > between a significant trailing zero and fictitous bits. As of 3.1, the printing code should be fine: it's using David Gay's 'perfect rounding' code, so what's displayed should be correctly rounded to the requested precision. Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Mark Dickinson wrote: > On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels > wrote: >> As a user of Idle, I would not like to see the change you seek of >> having %f stay full-precision. When a number gets too long to print >> on a single line, the wrap depends on the current window width, and >> is calculated dynamically. One section of the display with a 8000 >> -digit (100-line) text makes Idle slow to scroll around in. It is >> too easy for numbers to go massively positive in a bug. > I had also said (without explaining: > > only the trailing zeroes with the e, so we wind up with: > > 1157920892373161954235709850086879078532699846656405640e+23 > > or 115792089237316195423570985008687907853269984665640564.0e+24 > > or some such, rather than > > 1.157920892373162e+77 > > or 1.15792089237316195423570985008687907853269984665640564e+77 These are all possible representations for 2 ** 256. > I see your point. Since we're talking about floats, thought, there > should never be more than 316 characters in a '%f' % x: the > largest float is around 1.8e308, giving 308 digits before the > point, 6 after, a decimal point, and possibly a minus sign. > (Assuming that your platform uses IEEE 754 doubles.) You are correct that I had not thought long and hard about that. 308 is livable, if not desireable. I was remebering accidentally displaying the result of a factorial call. >> However, this is, I agree, a problem. Since all of these numbers >> should end in a massive number of zeroes > > But they typically don't end in zeros (except the six zeros following > the point), > because they're stored in binary rather than decimal _but_ the printed decimal number I am proposing is within one ULP of the value of the binary numbery. That is, the majority of the digits in int(1e308) are a fiction -- they could just as well be the digits of int(1e308) + int(1e100) because 1e308 + 1e100 == 1e308 That is the sense in which I say those digits in decimal are zeroes. My proposal was to have the integer part of the expansion be a representation of the accuracy of the number in a visible form. I chose the value I chose since a zero lies at the very end, and tried to indicate I did not really care where trailing actual accuracy zeros get taken off the representation. The reason I don't care is that the code from getting a floating point value is tricky, and I suspect the printing code might not easily be able to distinguish between a significant trailing zero and fictitous bits. --Scott David Daniels scott.dani...@acm.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Mark Dickinson wrote: I'd like to propose two minor changes to float and complex formatting, for 3.1. I don't think either change should prove particularly disruptive. (1) Currently, '%f' formatting automatically changes to '%g' formatting for numbers larger than 1e50. For example: '%f' % 2**166. '93536104789177786765035829293842113257979682750464.00' '%f' % 2**167. '1.87072e+50' I propose removing this feature for 3.1 More details: The current behaviour is documented (standard library->builtin types). (Until very recently, it was actually misdocumented as changing at 1e25, not 1e50.) """For safety reasons, floating point precisions are clipped to 50; %f conversions for numbers whose absolute value is over 1e50 are replaced by %g conversions. [5] All other errors raise exceptions.""" There's even a footnote: """[5] These numbers are fairly arbitrary. They are intended to avoid printing endless strings of meaningless digits without hampering correct use and without having to know the exact precision of floating point values on a particular machine.""" I don't find this particularly convincing, though---I just don't see a really good reason not to give the user exactly what she/he asks for here. I have a suspicion that at least part of the motivation for the '%f' -> '%g' switch is that it means the implementation can use a fixed-size buffer. But Eric has fixed this (in 3.1, at least) and the buffer is now dynamically allocated, so this isn't a concern any more. Other reasons not to switch from '%f' to '%g' in this way: - the change isn't gentle: as you go over the 1e50 boundary, the number of significant digits produced suddenly changes from 56 to 6; Looking at your example, that jumped out at me as somewhat startling... it would make more sense to me if it stayed fixed at 56 sig digits for numbers larger than 1e50. So I agree with this, even if the default # of sig digits were less. +1 - now that we're using David Gay's 'perfect rounding' code, we can be sure that the digits aren't entirely meaningless, or at least that they're the 'right' meaningless digits. This wasn't true before. - C doesn't do this, and the %f, %g, %e formats really owe their heritage to C. - float formatting is already quite complicated enough; no need to add to the mental complexity - removal simplifies the implementation :-) On to the second proposed change: (2) complex str and repr don't behave like float str and repr, in that the float version always adds a trailing '.0' (unless there's an exponent), but the complex version doesn't: 4., 10. (4.0, 10.0) 4. + 10.j (4+10j) I propose changing the complex str and repr to behave like the float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" rather than "(4+10j)". Mostly this is just about consistency, ease of implementation, and aesthetics. As far as I can tell, the extra '.0' in the float repr serves two closely-related purposes: it makes it clear to the human reader that the number is a float rather than an integer, and it makes sure that e.g., eval(repr(x)) recovers a float rather than an int. The latter point isn't a concern for the current complex repr, but the former is: 4+10j looks to me more like a Gaussian integer than a complex number. I agree. A complex is alternately an ordered pair of floats. A different, number-theory oriented implementation of Python might even want to read 4+10j as a G. i. tjr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels wrote: > As a user of Idle, I would not like to see the change you seek of > having %f stay full-precision. When a number gets too long to print > on a single line, the wrap depends on the current window width, and > is calculated dynamically. One section of the display with a 8000 > -digit (100-line) text makes Idle slow to scroll around in. It is > too easy for numbers to go massively positive in a bug. I see your point. Since we're talking about floats, thought, there should never be more than 316 characters in a '%f' % x: the largest float is around 1.8e308, giving 308 digits before the point, 6 after, a decimal point, and possibly a minus sign. (Assuming that your platform uses IEEE 754 doubles.) > However, this is, I agree, a problem. Since all of these numbers > should end in a massive number of zeroes But they typically don't end in zeros (except the six zeros following the point), because they're stored in binary rather than decimal. For example: >>> int(1e308) 11097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336 Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Mark Dickinson wrote: ... """[5] These numbers are fairly arbitrary. They are intended to avoid printing endless strings of meaningless digits without hampering correct use and without having to know the exact precision of floating point values on a particular machine.""" I don't find this particularly convincing, though---I just don't see a really good reason not to give the user exactly what she/he asks for here. As a user of Idle, I would not like to see the change you seek of having %f stay full-precision. When a number gets too long to print on a single line, the wrap depends on the current window width, and is calculated dynamically. One section of the display with a 8000 -digit (100-line) text makes Idle slow to scroll around in. It is too easy for numbers to go massively positive in a bug. - the change isn't gentle: as you go over the 1e50 boundary, the number of significant digits produced suddenly changes from 56 to 6; it would make more sense to me if it stayed fixed at 56 sig digits for numbers larger than 1e50. > - now that we're using David Gay's 'perfect rounding' >code, we can be sure that the digits aren't entirely >meaningless, or at least that they're the 'right' meaningless >digits. This wasn't true before. However, this is, I agree, a problem. Since all of these numbers should end in a massive number of zeroes, how about we replace only the trailing zeroes with the e, so we wind up with: 1157920892373161954235709850086879078532699846656405640e+23 or 115792089237316195423570985008687907853269984665640564.0e+24 or some such, rather than 1.157920892373162e+77 or 1.15792089237316195423570985008687907853269984665640564e+77 --Scott David Daniels scott.dani...@acm.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
On Sun, Apr 26, 2009 at 5:59 PM, Eric Smith wrote: > Mark Dickinson wrote: >> I propose changing the complex str and repr to behave like the >> float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" >> rather than "(4+10j)". > > I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, I'm > not sure about the spaces around the sign. If we do want the spaces there, Whoops. The spaces were a mistake: I'm not proposing to add those. I meant "(4.0+10.0j)" rather than "(4.0 + 10.0j)". Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Mark Dickinson wrote: I'd like to propose two minor changes to float and complex formatting, for 3.1. I don't think either change should prove particularly disruptive. (1) Currently, '%f' formatting automatically changes to '%g' formatting for numbers larger than 1e50. For example: ... I propose removing this feature for 3.1 I'm +1 on this. I have a suspicion that at least part of the motivation for the '%f' -> '%g' switch is that it means the implementation can use a fixed-size buffer. But Eric has fixed this (in 3.1, at least) and the buffer is now dynamically allocated, so this isn't a concern any more. I agree that this is a big part of the reason it was done. There's still some work to be done in the fallback code which we use if we can't use Gay's implementation of _Py_dg_dtoa. But it's reasonably easy to calculate the maximum buffer size needed given the precision, for passing on to PyOS_snprintf. (At least I think that sentence is true, I'll very with Mark offline). Other reasons not to switch from '%f' to '%g' in this way: - the change isn't gentle: as you go over the 1e50 boundary, the number of significant digits produced suddenly changes from 56 to 6; it would make more sense to me if it stayed fixed at 56 sig digits for numbers larger than 1e50. This is the big reason for me. - float formatting is already quite complicated enough; no need to add to the mental complexity And this, too. (2) complex str and repr don't behave like float str and repr, in that the float version always adds a trailing '.0' (unless there's an exponent), but the complex version doesn't: ... I propose changing the complex str and repr to behave like the float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" rather than "(4+10j)". I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, I'm not sure about the spaces around the sign. If we do want the spaces there, we can get rid of Py_DTSF_SIGN, since that's the only place it's used and we won't be able to use it for complex going forward. Eric. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Steven D'Aprano wrote: On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote: I'd like to propose two minor changes to float and complex formatting, for 3.1. I don't think either change should prove particularly disruptive. (1) Currently, '%f' formatting automatically changes to '%g' formatting for numbers larger than 1e50. ... I propose removing this feature for 3.1 No objections from me. +1 I propose changing the complex str and repr to behave like the float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" rather than "(4+10j)". No objections here either. +0 Doing it sooner rather than later means that it is less likely to disrupt anyone relying on the representation (i.e. doctests). Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote: > I'd like to propose two minor changes to float and complex > formatting, for 3.1. I don't think either change should prove > particularly disruptive. > > (1) Currently, '%f' formatting automatically changes to '%g' > formatting for numbers larger than 1e50. ... > I propose removing this feature for 3.1 No objections from me. +1 > I propose changing the complex str and repr to behave like the > float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" > rather than "(4+10j)". No objections here either. +0 -- Steven D'Aprano ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com