Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Scott David Daniels

ark Dickinson wrote:
> On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels wrote:
>...
>> I had also said (without explaining:
 only the trailing zeroes with the e, so we wind up with:
  1157920892373161954235709850086879078532699846656405640e+23
  or 115792089237316195423570985008687907853269984665640564.0e+24
  or some such, rather than
  1.157920892373162e+77
  or 1.15792089237316195423570985008687907853269984665640564e+77
>> These are all possible representations for 2 ** 256.
>
> Understood.
>
>> _but_ the printed decimal number I am proposing is within one ULP of
>> the value of the binary numbery.
>
> But there are plenty of ways to get this if this is what you want: if
> you want a displayed result that's within 1 ulp (or 0.5 ulps, which
> would be better) of the true value then repr should serve your needs.

The representation I am suggesting here is a half-way measure between
your proposal and the existing behvior.  This representation addresses
the abrupt transition that you point out (number of significant digits
drops precipitously) without particularly changing the goal of the
transition (displaying faux accuracy), without, in my (possibly naive)
view, seriously complicating either the print-generating code or the
issues for the reader of the output.

To wit, the proposal is (A) for numbers where the printed digits exceed
the accuracy presented, represent the result as an integer with an e+N,
rather than a number between 1 and 2-epsilon with an exponent that makes
you have to count digits to compare the two values, and (B) that the full
precision available in the the value be shown in the representation.

Given that everyone understands that is what I am proposing, I am OK
with the decision going where it will.  I am comforted that we are only
talking about about four wrapped lines if we go to the full integer,
which I had not realized.  Further, I agree with you that there is an
abrupt transition in represented accuracy as we cross from %f to %g,
that should be somehow addressed.  You want to address it by continuing
to show digits, and I want to limit the digits shown to a value that
reflects the known accuracy.  I also want text that compares "smoothly"
with numbers near the transition (so that greater-than and less-than
relationships are obvious without thinking, hence the representation
that avoids the "normalized" mantissa.
 .
Having said all this, I think my compromise position should be clear.
I did not mean to argue with you, but rather intended to propose a
possible middle way that some might find appealing.

--Scott David Daniels
scott.dani...@acm.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Eric Smith

Mark Dickinson wrote:

(1) Currently, '%f' formatting automatically changes to '%g' formatting for
numbers larger than 1e50.  For example:


'%f' % 2**166.

'93536104789177786765035829293842113257979682750464.00'

'%f' % 2**167.

'1.87072e+50'

I propose removing this feature for 3.1


I don't think we've stated it on this discussion, but I know from 
private email with Mark that his proposal is for both %-formatting and 
for float.__format__ to have this change. I just want to get it on the 
record here.


Eric.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Raymond Hettinger



it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.


So I agree with this, even if the default # of sig digits were less.


Several reasons to accept Mark's proposal:

* It matches what C does and many languages tend to copy the
  C standards with respect to format codes.  Matching other
  languages helps in porting code, copying algorithms, and mentally
  switching back and forth when working in multiple languages.

* When a programmer has chosen %f, that means that they have
  consciously rejected choosing %e or %g.  It is generally best to
  have the code do what the programmer asked for ;-)

* Code that tested well with 1e47, 1e48, 1e49, and 1e50
  suddenly shifts behavior with 1e51.  Behavior shifts like that
  are bug bait.

* The 56 significant digits may be rooted in the longest
  decimal expansion of a 53 bit float.  For example,
  len(str(Decimal.from_float(.1))) is 57 including the leading
  zero.   But not all machines (now, in the past, or in the future)
  use 53 bits for the significand.

* Use of exponents is common but not universal.  Some converters
  for SQL specs like Decimal(10,80) may not recognize the
  e-notation.  The xmlrpc spec only accepts decimal expansions
  not %e notation.

* The programmer needs to have some way to spell-out a
  decimal expansion when needed.   Currently, %f is the only way.


Raymond




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels
 wrote:

> I had also said (without explaining:
>> > only the trailing zeroes with the e, so we wind up with:
>> >      1157920892373161954235709850086879078532699846656405640e+23
>> >  or 115792089237316195423570985008687907853269984665640564.0e+24
>> >  or some such, rather than
>> >      1.157920892373162e+77
>> >  or 1.15792089237316195423570985008687907853269984665640564e+77
> These are all possible representations for 2 ** 256.

Understood.

> _but_ the printed decimal number I am proposing is within one ULP of
> the value of the binary numbery.

But there are plenty of ways to get this if this is what you want: if
you want a displayed result that's within 1 ulp (or 0.5 ulps, which
would be better) of the true value then repr should serve your needs.
If you want more control over the number of significant digits then
'%g' formatting gives that, together with a nice-looking output for
small numbers.

It's only '%f' formatting that I'm proposing changing: I see a
'%.2f' formatting request as a very specific, precise one: give me
exactly 2 digits after the point---no more, no less, and it seems
wrong and arbitrary that this request should be ignored for
numbers larger than 1e50 in absolute value.

That is, for general float formatting needs, use %g, str and repr.
%e and %f are for when you want fine control.

> That is, the majority of the digits
> in int(1e308) are a fiction

Not really: the float that Python stores has a very specific value,
and the '%f' formatting is showing exactly that value.  (Yes, I
know that some people advocate viewing a float as a range
of values rather than a specific value;  but I'm pretty sure that
that's not the way that the creators of IEEE 754 were thinking.)

> zeros get taken off the representation.  The reason I don't care is
> that the code from getting a floating point value is tricky, and I
> suspect the printing code might not easily be able to distinguish
> between a significant trailing zero and fictitous bits.

As of 3.1, the printing code should be fine:  it's using David
Gay's 'perfect rounding' code, so what's displayed should
be correctly rounded to the requested precision.

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Scott David Daniels

Mark Dickinson wrote:
> On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels
>  wrote:
>> As a user of Idle, I would not like to see the change you seek of
>> having %f stay full-precision.  When a number gets too long to print
>> on a single line, the wrap depends on the current window width, and
>> is calculated dynamically.  One section of the display with a 8000
>> -digit (100-line) text makes Idle slow to scroll around in.  It is
>> too easy for numbers to go massively positive in a bug.
>
I had also said (without explaining:
> > only the trailing zeroes with the e, so we wind up with:
> >  1157920892373161954235709850086879078532699846656405640e+23
> >  or 115792089237316195423570985008687907853269984665640564.0e+24
> >  or some such, rather than
> >  1.157920892373162e+77
> >  or 1.15792089237316195423570985008687907853269984665640564e+77
These are all possible representations for 2 ** 256.

> I see your point.  Since we're talking about floats, thought, there
> should never be more than 316 characters in a '%f' % x: the
> largest float is around 1.8e308, giving 308 digits before the
> point, 6 after, a decimal point, and possibly a minus sign.
> (Assuming that your platform uses IEEE 754 doubles.)
You are correct that I had not thought long and hard about that.
308 is livable, if not desireable.  I was remebering accidentally
displaying the result of a factorial call.

>> However, this is, I agree, a problem.  Since all of these numbers

>> should end in a massive number of zeroes
>
> But they typically don't end in zeros (except the six zeros following
> the point),
> because they're stored in binary rather than decimal
_but_ the printed decimal number I am proposing is within one ULP of
the value of the binary numbery.  That is, the majority of the digits
in int(1e308) are a fiction -- they could just as well be the digits of
int(1e308) + int(1e100) because 1e308 + 1e100 == 1e308
That is the sense in which I say those digits in decimal are zeroes.
My proposal was to have the integer part of the expansion be a
representation of the accuracy of the number in a visible form.
I chose the value I chose since a zero lies at the very end, and
tried to indicate I did not really care where trailing actual accuracy
zeros get taken off the representation.  The reason I don't care is
that the code from getting a floating point value is tricky, and I
suspect the printing code might not easily be able to distinguish
between a significant trailing zero and fictitous bits.

--Scott David Daniels
scott.dani...@acm.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Terry Reedy

Mark Dickinson wrote:

I'd like to propose two minor changes to float and complex
formatting, for 3.1.  I don't think either change should prove
particularly disruptive.

(1) Currently, '%f' formatting automatically changes to '%g' formatting for
numbers larger than 1e50.  For example:


'%f' % 2**166.

'93536104789177786765035829293842113257979682750464.00'

'%f' % 2**167.

'1.87072e+50'

I propose removing this feature for 3.1

More details: The current behaviour is documented (standard
library->builtin types).  (Until very recently, it was actually
misdocumented as changing at 1e25, not 1e50.)

"""For safety reasons, floating point precisions are clipped to 50; %f
conversions for numbers whose absolute value is over 1e50 are
replaced by %g conversions. [5] All other errors raise exceptions."""

There's even a footnote:

"""[5]   These numbers are fairly arbitrary. They are intended to
avoid printing endless strings of meaningless digits without
hampering correct use and without having to know the exact
precision of floating point values on a particular machine."""

I don't find this particularly convincing, though---I just don't see
a really good reason not to give the user exactly what she/he
asks for here.  I have a suspicion that at least part of the
motivation for the '%f' -> '%g' switch is that it means the
implementation can use a fixed-size buffer.  But Eric has
fixed this (in 3.1, at least) and the buffer is now dynamically
allocated, so this isn't a concern any more.

Other reasons not to switch from '%f' to '%g' in this way:

 - the change isn't gentle:  as you go over the 1e50 boundary,
   the number of significant digits produced suddenly changes
   from 56 to 6; 


Looking at your example, that jumped out at me as somewhat startling...


it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.


So I agree with this, even if the default # of sig digits were less.
+1


 - now that we're using David Gay's 'perfect rounding'
   code, we can be sure that the digits aren't entirely
   meaningless, or at least that they're the 'right' meaningless
   digits.  This wasn't true before.
 - C doesn't do this, and the %f, %g, %e formats really
   owe their heritage to C.
 - float formatting is already quite complicated enough; no
   need to add to the mental complexity
 - removal simplifies the implementation :-)


On to the second proposed change:

(2) complex str and repr don't behave like float str and repr, in that
the float version always adds a trailing '.0' (unless there's an
exponent), but the complex version doesn't:


4., 10.

(4.0, 10.0)

4. + 10.j

(4+10j)

I propose changing the complex str and repr to behave like the
float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
rather than "(4+10j)".

Mostly this is just about consistency, ease of implementation,
and aesthetics.  As far as I can tell, the extra '.0' in the float
repr serves two closely-related purposes:  it makes it clear to
the human reader that the number is a float rather than an
integer, and it makes sure that e.g., eval(repr(x)) recovers a
float rather than an int.  The latter point isn't a concern for
the current complex repr, but the former is:  4+10j looks to
me more like a Gaussian integer than a complex number.


I agree.  A complex is alternately an ordered pair of floats.  A 
different, number-theory oriented implementation of Python might even 
want to read 4+10j as a G. i.


tjr

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels
 wrote:
> As a user of Idle, I would not like to see the change you seek of
> having %f stay full-precision.  When a number gets too long to print
> on a single line, the wrap depends on the current window width, and
> is calculated dynamically.  One section of the display with a 8000
> -digit (100-line) text makes Idle slow to scroll around in.  It is
> too easy for numbers to go massively positive in a bug.

I see your point.  Since we're talking about floats, thought, there
should never be more than 316 characters in a '%f' % x: the
largest float is around 1.8e308, giving 308 digits before the
point, 6 after, a decimal point, and possibly a minus sign.
(Assuming that your platform uses IEEE 754 doubles.)

> However, this is, I agree, a problem.  Since all of these numbers
> should end in a massive number of zeroes

But they typically don't end in zeros (except the six zeros following
the point),
because they're stored in binary rather than decimal.  For example:

>>> int(1e308)
11097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Scott David Daniels

Mark Dickinson wrote:

... """[5]   These numbers are fairly arbitrary. They are intended to
   avoid printing endless strings of meaningless digits without
   hampering correct use and without having to know the exact
   precision of floating point values on a particular machine."""
I don't find this particularly convincing, though---I just don't see
a really good reason not to give the user exactly what she/he
asks for here.

As a user of Idle, I would not like to see the change you seek of
having %f stay full-precision.  When a number gets too long to print
on a single line, the wrap depends on the current window width, and
is calculated dynamically.  One section of the display with a 8000
-digit (100-line) text makes Idle slow to scroll around in.  It is
too easy for numbers to go massively positive in a bug.


 - the change isn't gentle:  as you go over the 1e50 boundary,
   the number of significant digits produced suddenly changes
   from 56 to 6;  it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.

>  - now that we're using David Gay's 'perfect rounding'
>code, we can be sure that the digits aren't entirely
>meaningless, or at least that they're the 'right' meaningless
>digits.  This wasn't true before.

However, this is, I agree, a problem.  Since all of these numbers
should end in a massive number of zeroes, how about we replace
only the trailing zeroes with the e, so we wind up with:
 1157920892373161954235709850086879078532699846656405640e+23
  or 115792089237316195423570985008687907853269984665640564.0e+24
or some such, rather than
 1.157920892373162e+77
  or 1.15792089237316195423570985008687907853269984665640564e+77

--Scott David Daniels
scott.dani...@acm.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 5:59 PM, Eric Smith  wrote:
> Mark Dickinson wrote:
>> I propose changing the complex str and repr to behave like the
>> float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
>> rather than "(4+10j)".
>
> I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, I'm
> not sure about the spaces around the sign. If we do want the spaces there,

Whoops.  The spaces were a mistake:  I'm not proposing to add those.
I meant "(4.0+10.0j)" rather than "(4.0 + 10.0j)".

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Eric Smith

Mark Dickinson wrote:

I'd like to propose two minor changes to float and complex
formatting, for 3.1.  I don't think either change should prove
particularly disruptive.

(1) Currently, '%f' formatting automatically changes to '%g' formatting for
numbers larger than 1e50.  For example:

...

I propose removing this feature for 3.1


I'm +1 on this.


I have a suspicion that at least part of the
motivation for the '%f' -> '%g' switch is that it means the
implementation can use a fixed-size buffer.  But Eric has
fixed this (in 3.1, at least) and the buffer is now dynamically
allocated, so this isn't a concern any more.


I agree that this is a big part of the reason it was done. There's still 
some work to be done in the fallback code which we use if we can't use 
Gay's implementation of _Py_dg_dtoa. But it's reasonably easy to 
calculate the maximum buffer size needed given the precision, for 
passing on to PyOS_snprintf. (At least I think that sentence is true, 
I'll very with Mark offline).



Other reasons not to switch from '%f' to '%g' in this way:

 - the change isn't gentle:  as you go over the 1e50 boundary,
   the number of significant digits produced suddenly changes
   from 56 to 6;  it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.


This is the big reason for me.


 - float formatting is already quite complicated enough; no
   need to add to the mental complexity


And this, too.


(2) complex str and repr don't behave like float str and repr, in that
the float version always adds a trailing '.0' (unless there's an
exponent), but the complex version doesn't:

...

I propose changing the complex str and repr to behave like the
float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
rather than "(4+10j)".


I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, 
I'm not sure about the spaces around the sign. If we do want the spaces 
there, we can get rid of Py_DTSF_SIGN, since that's the only place it's 
used and we won't be able to use it for complex going forward.


Eric.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Michael Foord

Steven D'Aprano wrote:

On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote:
  

I'd like to propose two minor changes to float and complex
formatting, for 3.1.  I don't think either change should prove
particularly disruptive.

(1) Currently, '%f' formatting automatically changes to '%g'
formatting for numbers larger than 1e50.


...
  

I propose removing this feature for 3.1



No objections from me. +1

  

I propose changing the complex str and repr to behave like the
float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
rather than "(4+10j)".



No objections here either. +0



  
Doing it sooner rather than later means that it is less likely to 
disrupt anyone relying on the representation (i.e. doctests).


Michael

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Steven D'Aprano
On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote:
> I'd like to propose two minor changes to float and complex
> formatting, for 3.1.  I don't think either change should prove
> particularly disruptive.
>
> (1) Currently, '%f' formatting automatically changes to '%g'
> formatting for numbers larger than 1e50.
...
> I propose removing this feature for 3.1

No objections from me. +1

> I propose changing the complex str and repr to behave like the
> float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
> rather than "(4+10j)".

No objections here either. +0



-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com