[Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Thomas Robitaille
Hi,

I'm trying to generate random 64-bit integer values for integers and  
floats using Numpy, within the entire range of valid values for that  
type. To generate random 32-bit floats, I can use:

np.random.uniform(low=np.finfo(np.float32).min,high=np.finfo 
(np.float32).max,size=10)

which gives for example

array([  1.47351436e+37,   9.93620693e+37,   2.22893053e+38,
 -3.33828977e+38,   1.08247781e+37,  -8.37481260e+37,
  2.64176554e+38,  -2.72207226e+37,   2.54790459e+38,
 -2.47883866e+38])

but if I try and use this for 64-bit numbers, i.e.

np.random.uniform(low=np.finfo(np.float64).min,high=np.finfo 
(np.float64).max,size=10)

I get

array([ Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf])

Similarly, for integers, I can successfully generate random 32-bit  
integers:

np.random.random_integers(np.iinfo(np.int32).min,high=np.iinfo 
(np.int32).max,size=10)

which gives

array([-1506183689,   662982379, -1616890435, -1519456789,  1489753527,
 -604311122,  2034533014,   449680073,  -444302414,  
-1924170329])

but am unsuccessful for 64-bit integers, i.e.

np.random.random_integers(np.iinfo(np.int64).min,high=np.iinfo 
(np.int64).max,size=10)

which produces the following error:

OverflowError: long int too large to convert to int

Is this expected behavior, or are these bugs?

Thanks for any help,

Thomas
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Sturla Molden
Thomas Robitaille skrev:
> np.random.random_integers(np.iinfo(np.int32).min,high=np.iinfo 
> (np.int32).max,size=10)
>
> which gives
>
> array([-1506183689,   662982379, -1616890435, -1519456789,  1489753527,
>  -604311122,  2034533014,   449680073,  -444302414,  
> -1924170329])
>
>   

This fails on my computer (Python 2.6.4, NumPy 1.3.0 on Win32).

 >>> np.random.random_integers(np.iinfo(np.int32).min,high=np.iinfo
(np.int32).max,size=10)

Traceback (most recent call last):
  File "", line 2, in 
(np.int32).max,size=10)
  File "mtrand.pyx", line 950, in mtrand.RandomState.random_integers
  File "mtrand.pyx", line 746, in mtrand.RandomState.randint
OverflowError: long int too large to convert to int

It might have something to do with this:

 >>> 2**31-1
2147483647L
 >>> -2**31
-2147483648L

In light of this annoying behaviour:

def random_int64(size):
a0 = np.random.random_integers(0, 0x, size=size).astype(np.uint64)
a1 = np.random.random_integers(0, 0x, size=size).astype(np.uint64)
a2 = np.random.random_integers(0, 0x, size=size).astype(np.uint64)
a3 = np.random.random_integers(0, 0x, size=size).astype(np.uint64)
a = a0 + (a1<<16) + (a2 << 32) + (a3 << 48)
return a.view(dtype=np.int64)


Sturla









___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread David Cournapeau
Thomas Robitaille wrote:
> Hi,
>
> I'm trying to generate random 64-bit integer values for integers and  
> floats using Numpy, within the entire range of valid values for that  
> type. To generate random 32-bit floats, I can use:
>
> np.random.uniform(low=np.finfo(np.float32).min,high=np.finfo 
> (np.float32).max,size=10)
>
> which gives for example
>
> array([  1.47351436e+37,   9.93620693e+37,   2.22893053e+38,
>  -3.33828977e+38,   1.08247781e+37,  -8.37481260e+37,
>   2.64176554e+38,  -2.72207226e+37,   2.54790459e+38,
>  -2.47883866e+38])
>   
> but if I try and use this for 64-bit numbers, i.e.
>
> np.random.uniform(low=np.finfo(np.float64).min,high=np.finfo 
> (np.float64).max,size=10)
>
> I get
>
> array([ Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf])
>
> Similarly, for integers, I can successfully generate random 32-bit  
> integers:
>
> np.random.random_integers(np.iinfo(np.int32).min,high=np.iinfo 
> (np.int32).max,size=10)
>
> which gives
>
> array([-1506183689,   662982379, -1616890435, -1519456789,  1489753527,
>  -604311122,  2034533014,   449680073,  -444302414,  
> -1924170329])
>
> but am unsuccessful for 64-bit integers, i.e.
>
> np.random.random_integers(np.iinfo(np.int64).min,high=np.iinfo 
> (np.int64).max,size=10)
>
> which produces the following error:
>
> OverflowError: long int too large to convert to int
>
> Is this expected behavior, or are these bugs?
>   

I think those are bugs, but it may be difficult to fix.

You can check that if you restrict a tiny bit your interval, you get
better result:

import numpy as np
# max/min for double precision is ~ 1.8e308
low, high = -1e308, 1e308
np.random.uniformat(low, high, 100) # bunch of inf
low, high = -1e307, 1e307
np.random.uniformat(low, high, 100) # much more reasonable

It may be that you are pushing the limits of the random generator. Your
min and max may be border cases: if you use the min/max representable
numbers, and the random generator needs to do any addition of a positive
number, you will 'overflow' your float number (Robert will have a better
answer to this). The problem is that it may be difficult to detect this
in advance.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread josef . pktd
On Sun, Nov 1, 2009 at 10:20 PM, David Cournapeau
 wrote:
> Thomas Robitaille wrote:
>> Hi,
>>
>> I'm trying to generate random 64-bit integer values for integers and
>> floats using Numpy, within the entire range of valid values for that
>> type. To generate random 32-bit floats, I can use:
>>
>> np.random.uniform(low=np.finfo(np.float32).min,high=np.finfo
>> (np.float32).max,size=10)
>>
>> which gives for example
>>
>> array([  1.47351436e+37,   9.93620693e+37,   2.22893053e+38,
>>          -3.33828977e+38,   1.08247781e+37,  -8.37481260e+37,
>>           2.64176554e+38,  -2.72207226e+37,   2.54790459e+38,
>>          -2.47883866e+38])
>>
>> but if I try and use this for 64-bit numbers, i.e.
>>
>> np.random.uniform(low=np.finfo(np.float64).min,high=np.finfo
>> (np.float64).max,size=10)
>>
>> I get
>>
>> array([ Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf])
>>
>> Similarly, for integers, I can successfully generate random 32-bit
>> integers:
>>
>> np.random.random_integers(np.iinfo(np.int32).min,high=np.iinfo
>> (np.int32).max,size=10)
>>
>> which gives
>>
>> array([-1506183689,   662982379, -1616890435, -1519456789,  1489753527,
>>          -604311122,  2034533014,   449680073,  -444302414,
>> -1924170329])
>>
>> but am unsuccessful for 64-bit integers, i.e.
>>
>> np.random.random_integers(np.iinfo(np.int64).min,high=np.iinfo
>> (np.int64).max,size=10)
>>
>> which produces the following error:
>>
>> OverflowError: long int too large to convert to int
>>
>> Is this expected behavior, or are these bugs?
>>
>
> I think those are bugs, but it may be difficult to fix.
>
> You can check that if you restrict a tiny bit your interval, you get
> better result:
>
> import numpy as np
> # max/min for double precision is ~ 1.8e308
> low, high = -1e308, 1e308
> np.random.uniformat(low, high, 100) # bunch of inf
> low, high = -1e307, 1e307
> np.random.uniformat(low, high, 100) # much more reasonable
>
> It may be that you are pushing the limits of the random generator. Your
> min and max may be border cases: if you use the min/max representable
> numbers, and the random generator needs to do any addition of a positive
> number, you will 'overflow' your float number (Robert will have a better
> answer to this). The problem is that it may be difficult to detect this
> in advance.
>
> David
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


array([ Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf])

might actually be the right answer if you want a uniform distribution
on the real line.

I never realized how many numbers are out there when I saw that most
numbers in the example are e+37 or e+38.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Robert Kern
On Sun, Nov 1, 2009 at 20:57, Thomas Robitaille
 wrote:
> Hi,
>
> I'm trying to generate random 64-bit integer values for integers and
> floats using Numpy, within the entire range of valid values for that
> type.

64-bit and larger integers could be done, but it requires
modification. The integer distributions were written to support C
longs, not anything larger. You could also use .bytes() and
np.fromstring().

> To generate random 32-bit floats, I can use:
> np.random.uniform(low=np.finfo(np.float32).min,high=np.finfo
> (np.float32).max,size=10)

What is the use case here? I know of none. Floating point is a bit
weird and will cause you many problems over such an extended range.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread David Cournapeau
josef.p...@gmail.com wrote:
>
> array([ Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf])
>
> might actually be the right answer if you want a uniform distribution
> on the real line.

Does it make sense to define a uniform random variable whose range is
the extended real line ? It would not have a distribution w.r.t the
Lebesgue measure, right ?

I must confess I am quickly lost in the maths in statistics, though,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Sturla Molden
Robert Kern skrev:
> 64-bit and larger integers could be done, but it requires
> modification. The integer distributions were written to support C
> longs, not anything larger. You could also use .bytes() and
> np.fromstring().
>   
But as of Python 2.6.4, even 32-bit integers fail, at least on Windows.


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Robert Kern
On Sun, Nov 1, 2009 at 22:17, Sturla Molden  wrote:
> Robert Kern skrev:
>> 64-bit and larger integers could be done, but it requires
>> modification. The integer distributions were written to support C
>> longs, not anything larger. You could also use .bytes() and
>> np.fromstring().
>>
> But as of Python 2.6.4, even 32-bit integers fail, at least on Windows.

Then let me clarify: it was written to support integer ranges up to
sys.maxint. Absolutely, it would be desirable to extend it.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread David Cournapeau
Sturla Molden wrote:
> Robert Kern skrev:
>   
>> 64-bit and larger integers could be done, but it requires
>> modification. The integer distributions were written to support C
>> longs, not anything larger. You could also use .bytes() and
>> np.fromstring().
>>   
>> 
> But as of Python 2.6.4, even 32-bit integers fail, at least on Windows.
>   

It fails on linux as well - I think it is a 32 vs 64 bits issue, not a
windows vs linux. I don't know what happens on windows 64, though: we
may have issue if we use long.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Sturla Molden
Robert Kern skrev:
> Then let me clarify: it was written to support integer ranges up to
> sys.maxint. Absolutely, it would be desirable to extend it.
>
>   
I know, but look at this:

 >>> import sys
 >>> sys.maxint
2147483647
 >>> 2**31-1
2147483647L

sys.maxint becomes a long, which is what confuses mtrand.















___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread josef . pktd
On Sun, Nov 1, 2009 at 10:55 PM, David Cournapeau
 wrote:
> josef.p...@gmail.com wrote:
>>
>> array([ Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf])
>>
>> might actually be the right answer if you want a uniform distribution
>> on the real line.
>
> Does it make sense to define a uniform random variable whose range is
> the extended real line ? It would not have a distribution w.r.t the
> Lebesgue measure, right ?
>
> I must confess I am quickly lost in the maths in statistics, though,

No, it wouldn't be a proper distribution. However in Bayesian analysis
it is used as an improper (diffuse) prior, which often replicates
frequentists results. But it's a theoretical derivation, I don't think
anyone tries to simulate this.

To simulate huge uniform integers, I think it should be possible to use
the floating point random numbers and rescale and round them.

Josef

>
> David
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Sturla Molden
Robert Kern skrev:
> Then let me clarify: it was written to support integer ranges up to
> sys.maxint. Absolutely, it would be desirable to extend it.
>
>   
Actually it only supports integers up to sys.maxint-1, as 
random_integers call randint. random_integers includes the upper range, 
but randint excludes the upper range. Thus, this happens on line 1153 in 
mtrand.pyx:

return self.randint(low, high+1, size)

The main source of the problem is that number smaller than sys.maxint 
can become a long. (I have asked why on python-dev, it does not make any 
sence.) So when random_integers pass "high+1" to randint, it is 
unneccesarily converted to a long. Then, there is an exception on line 847:

hi = high

With hi previously declared to long, Cython refuses the conversion. Now, 
we could try a downcast to int like this:

hi = int(high)

which would make Cython only raise an exception in case of an integer 
overflow.

 >>> int(2**31)
2147483648L
 >>> int(2**31-1)
2147483647

If there is no overflow, high becomes an int and conversion to C long is 
allowed. Still, this will only support integer ranges up to sys.maxint - 
1. We thus have to swap the order of randint and random_intgers. The one 
with the inclusive upper interval should call rk_interval.


Sturla







































___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Sturla Molden
Sturla Molden skrev:
> Robert Kern skrev:
>   
>> Then let me clarify: it was written to support integer ranges up to
>> sys.maxint. Absolutely, it would be desirable to extend it.
>>
>>   
>> 
> Actually it only supports integers up to sys.maxint-1, as 
> random_integers call randint. random_integers includes the upper range, 
> but randint excludes the upper range. Thus, this happens on line 1153 in 
> mtrand.pyx:
>
> return self.randint(low, high+1, size)
>
> inclusive upper interval should call rk_interval
>   

I love this one:

cdef long lo, hi, diff
[...]
diff = hi - lo - 1

which silently overflows, and is the reason for this strange exception:

 >>> np.random.random_integers(-2147483648,high=2147483646,size=10)

Traceback (most recent call last):
  File "", line 1, in 
np.random.random_integers(-2147483648,high=2147483646,size=10)
  File "mtrand.pyx", line 950, in mtrand.RandomState.random_integers
  File "mtrand.pyx", line 750, in mtrand.RandomState.randint
ValueError: low >= high

I'll call this a bug.


Sturla



















___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Robert Kern
On Sun, Nov 1, 2009 at 23:14, Sturla Molden  wrote:

> I'll call this a bug.

Yes.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread David Cournapeau
josef.p...@gmail.com wrote:
>
> No, it wouldn't be a proper distribution. However in Bayesian analysis
> it is used as an improper (diffuse) prior

Ah, right - I wonder how this is handled rigorously, though. I know some
basics of Bayesian statistics, but I don't much about Bayesian
statistics from a theoretical POV (i.e. a rigorous mathematical
development).

> To simulate huge uniform integers, I think it should be possible to use
> the floating point random numbers and rescale and round them.
>   

Rescaling and especially rounding may bias the distribution, no ? The
best (but long term) strategy would be to support arbitrary precision
integer, as mentioned by Robert.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread David Cournapeau
Sturla Molden wrote:
> Sturla Molden skrev:
>   
>> Robert Kern skrev:
>>   
>> 
>>> Then let me clarify: it was written to support integer ranges up to
>>> sys.maxint. Absolutely, it would be desirable to extend it.
>>>
>>>   
>>> 
>>>   
>> Actually it only supports integers up to sys.maxint-1, as 
>> random_integers call randint. random_integers includes the upper range, 
>> but randint excludes the upper range. Thus, this happens on line 1153 in 
>> mtrand.pyx:
>>
>> return self.randint(low, high+1, size)
>>
>> inclusive upper interval should call rk_interval
>>   
>> 
>
> I love this one:
>
> cdef long lo, hi, diff
> [...]
> diff = hi - lo - 1
>
> which silently overflows, and is the reason for this strange exception:
>
>  >>> np.random.random_integers(-2147483648,high=2147483646,size=10)
>
> Traceback (most recent call last):
>   File "", line 1, in 
> np.random.random_integers(-2147483648,high=2147483646,size=10)
>   File "mtrand.pyx", line 950, in mtrand.RandomState.random_integers
>   File "mtrand.pyx", line 750, in mtrand.RandomState.randint
> ValueError: low >= high
>
> I'll call this a bug.
>   

Yep, I was bitten by it as well: http://projects.scipy.org/numpy/ticket/965

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-02 Thread Anne Archibald
2009/11/1 Thomas Robitaille :
> Hi,
>
> I'm trying to generate random 64-bit integer values for integers and
> floats using Numpy, within the entire range of valid values for that
> type. To generate random 32-bit floats, I can use:

Others have addressed why this is giving bogus results. But it's worth
thinking about what you're actually trying to do. If it's "fuzz
tests", that is, producing unrestricted random floats to feed to a
function, then even if this worked it wouldn't produce what you want:
it will never produce floats of order unity, for example.

If you want do what you described, you could produce floats uniformly
from -1 to 1 and then multiply the results by the largest
representable float.

If you want "random floats", I suggest generating an array of random
bytes and reinterpreting them as floats. You'll get a fair number of
NaNs and infinities, so you may want to take only the finite values
and regenerate the rest. This will give you some numbers from all over
the range of floats, including tiny values (and denormalized values,
which are a potentially important special case).

Anne

> np.random.uniform(low=np.finfo(np.float32).min,high=np.finfo
> (np.float32).max,size=10)
>
> which gives for example
>
> array([  1.47351436e+37,   9.93620693e+37,   2.22893053e+38,
>         -3.33828977e+38,   1.08247781e+37,  -8.37481260e+37,
>          2.64176554e+38,  -2.72207226e+37,   2.54790459e+38,
>         -2.47883866e+38])
>
> but if I try and use this for 64-bit numbers, i.e.
>
> np.random.uniform(low=np.finfo(np.float64).min,high=np.finfo
> (np.float64).max,size=10)
>
> I get
>
> array([ Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf])
>
> Similarly, for integers, I can successfully generate random 32-bit
> integers:
>
> np.random.random_integers(np.iinfo(np.int32).min,high=np.iinfo
> (np.int32).max,size=10)
>
> which gives
>
> array([-1506183689,   662982379, -1616890435, -1519456789,  1489753527,
>         -604311122,  2034533014,   449680073,  -444302414,
> -1924170329])
>
> but am unsuccessful for 64-bit integers, i.e.
>
> np.random.random_integers(np.iinfo(np.int64).min,high=np.iinfo
> (np.int64).max,size=10)
>
> which produces the following error:
>
> OverflowError: long int too large to convert to int
>
> Is this expected behavior, or are these bugs?
>
> Thanks for any help,
>
> Thomas
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread David Goldsmith
Interesting thread, which leaves me wondering two things: is it documented
somewhere (e.g., at the IEEE site) precisely how many *decimal* mantissae
are representable using the 64-bit IEEE standard for float representation
(if that makes sense); and are such decimal mantissae uniformly distributed,
i.e., is the distance between each the same, or does the incommensurability
of two and five (and thus binary and decimal) make these irregularly, or
bi-periodically, i.e., every other mantissae are regularly spaced?

DG

On Mon, Nov 2, 2009 at 7:09 AM, Anne Archibald wrote:

> 2009/11/1 Thomas Robitaille :
> > Hi,
> >
> > I'm trying to generate random 64-bit integer values for integers and
> > floats using Numpy, within the entire range of valid values for that
> > type. To generate random 32-bit floats, I can use:
>
> Others have addressed why this is giving bogus results. But it's worth
> thinking about what you're actually trying to do. If it's "fuzz
> tests", that is, producing unrestricted random floats to feed to a
> function, then even if this worked it wouldn't produce what you want:
> it will never produce floats of order unity, for example.
>
> If you want do what you described, you could produce floats uniformly
> from -1 to 1 and then multiply the results by the largest
> representable float.
>
> If you want "random floats", I suggest generating an array of random
> bytes and reinterpreting them as floats. You'll get a fair number of
> NaNs and infinities, so you may want to take only the finite values
> and regenerate the rest. This will give you some numbers from all over
> the range of floats, including tiny values (and denormalized values,
> which are a potentially important special case).
>
> Anne
>
> > np.random.uniform(low=np.finfo(np.float32).min,high=np.finfo
> > (np.float32).max,size=10)
> >
> > which gives for example
> >
> > array([  1.47351436e+37,   9.93620693e+37,   2.22893053e+38,
> > -3.33828977e+38,   1.08247781e+37,  -8.37481260e+37,
> >  2.64176554e+38,  -2.72207226e+37,   2.54790459e+38,
> > -2.47883866e+38])
> >
> > but if I try and use this for 64-bit numbers, i.e.
> >
> > np.random.uniform(low=np.finfo(np.float64).min,high=np.finfo
> > (np.float64).max,size=10)
> >
> > I get
> >
> > array([ Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf,  Inf])
> >
> > Similarly, for integers, I can successfully generate random 32-bit
> > integers:
> >
> > np.random.random_integers(np.iinfo(np.int32).min,high=np.iinfo
> > (np.int32).max,size=10)
> >
> > which gives
> >
> > array([-1506183689,   662982379, -1616890435, -1519456789,  1489753527,
> > -604311122,  2034533014,   449680073,  -444302414,
> > -1924170329])
> >
> > but am unsuccessful for 64-bit integers, i.e.
> >
> > np.random.random_integers(np.iinfo(np.int64).min,high=np.iinfo
> > (np.int64).max,size=10)
> >
> > which produces the following error:
> >
> > OverflowError: long int too large to convert to int
> >
> > Is this expected behavior, or are these bugs?
> >
> > Thanks for any help,
> >
> > Thomas
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread David Warde-Farley
On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:

> Interesting thread, which leaves me wondering two things: is it  
> documented
> somewhere (e.g., at the IEEE site) precisely how many *decimal*  
> mantissae
> are representable using the 64-bit IEEE standard for float  
> representation
> (if that makes sense);

IEEE-754 says nothing about decimal representations aside from how to  
round when converting to and from strings. You have to provide/accept  
*at least* 9 decimal digits in the significand for single-precision  
and 17 for double-precision (section 5.6). AFAIK implementations will  
vary in how they handle cases where a binary significand would yield  
more digits than that.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread Charles R Harris
On Thu, Nov 5, 2009 at 4:26 PM, David Warde-Farley wrote:

> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
>
> > Interesting thread, which leaves me wondering two things: is it
> > documented
> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
> > mantissae
> > are representable using the 64-bit IEEE standard for float
> > representation
> > (if that makes sense);
>
> IEEE-754 says nothing about decimal representations aside from how to
> round when converting to and from strings. You have to provide/accept
> *at least* 9 decimal digits in the significand for single-precision
> and 17 for double-precision (section 5.6). AFAIK implementations will
> vary in how they handle cases where a binary significand would yield
> more digits than that.
>
>
I believe that was the argument for the extended precision formats. The
givien number of decimal digits is sufficient to recover the same float that
produced them if a slightly higher precision is used in the conversion.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread josef . pktd
On Thu, Nov 5, 2009 at 6:36 PM, Charles R Harris
 wrote:
>
>
> On Thu, Nov 5, 2009 at 4:26 PM, David Warde-Farley 
> wrote:
>>
>> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
>>
>> > Interesting thread, which leaves me wondering two things: is it
>> > documented
>> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
>> > mantissae
>> > are representable using the 64-bit IEEE standard for float
>> > representation
>> > (if that makes sense);
>>
>> IEEE-754 says nothing about decimal representations aside from how to
>> round when converting to and from strings. You have to provide/accept
>> *at least* 9 decimal digits in the significand for single-precision
>> and 17 for double-precision (section 5.6). AFAIK implementations will
>> vary in how they handle cases where a binary significand would yield
>> more digits than that.
>>
>
> I believe that was the argument for the extended precision formats. The
> givien number of decimal digits is sufficient to recover the same float that
> produced them if a slightly higher precision is used in the conversion.
>
> Chuck

>From the discussion for the floating point representation, it seems that
a uniform random number generator would have a very coarse grid
in the range for example -1e30 to +1e30 compared to interval -0.5,0.5.

How many points can be represented by a float in [-0.5,0.5] compared
to [1e30, 1e30+1.]?
If I interpret this correctly, then there are as many floating point numbers
in [0,1] as in [1,inf), or am I misinterpreting this.

So how does a PRNG handle a huge interval of uniform numbers?

Josef

>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread Charles R Harris
On Thu, Nov 5, 2009 at 7:04 PM,  wrote:

> On Thu, Nov 5, 2009 at 6:36 PM, Charles R Harris
>  wrote:
> >
> >
> > On Thu, Nov 5, 2009 at 4:26 PM, David Warde-Farley 
> > wrote:
> >>
> >> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
> >>
> >> > Interesting thread, which leaves me wondering two things: is it
> >> > documented
> >> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
> >> > mantissae
> >> > are representable using the 64-bit IEEE standard for float
> >> > representation
> >> > (if that makes sense);
> >>
> >> IEEE-754 says nothing about decimal representations aside from how to
> >> round when converting to and from strings. You have to provide/accept
> >> *at least* 9 decimal digits in the significand for single-precision
> >> and 17 for double-precision (section 5.6). AFAIK implementations will
> >> vary in how they handle cases where a binary significand would yield
> >> more digits than that.
> >>
> >
> > I believe that was the argument for the extended precision formats. The
> > givien number of decimal digits is sufficient to recover the same float
> that
> > produced them if a slightly higher precision is used in the conversion.
> >
> > Chuck
>
> >From the discussion for the floating point representation, it seems that
> a uniform random number generator would have a very coarse grid
> in the range for example -1e30 to +1e30 compared to interval -0.5,0.5.
>
> How many points can be represented by a float in [-0.5,0.5] compared
> to [1e30, 1e30+1.]?
> If I interpret this correctly, then there are as many floating point
> numbers
> in [0,1] as in [1,inf), or am I misinterpreting this.
>
> So how does a PRNG handle a huge interval of uniform numbers?
>
>
There are several implementations, but the ones I'm familiar with reduce to
scaling. If the rng produces random unsigned integers, then the range of
integers is scaled to the interval [0,1). The variations involve explicit
scaling (portable) or bit twiddling of the IEEE formats. In straight forward
scaling some ranges of the random integers may not map 1-1, so the unused
bits are masked off first; if you want doubles you only need 52 bits, etc.
For bit twiddling there is an implicit 1 in the mantissa, so the basic range
works out to [1,2), but that can be fixed by subtracting 1 from the result.
Handling larger ranges than [0,1) just involves another scaling.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread josef . pktd
On Thu, Nov 5, 2009 at 9:23 PM, Charles R Harris
 wrote:
>
>
> On Thu, Nov 5, 2009 at 7:04 PM,  wrote:
>>
>> On Thu, Nov 5, 2009 at 6:36 PM, Charles R Harris
>>  wrote:
>> >
>> >
>> > On Thu, Nov 5, 2009 at 4:26 PM, David Warde-Farley 
>> > wrote:
>> >>
>> >> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
>> >>
>> >> > Interesting thread, which leaves me wondering two things: is it
>> >> > documented
>> >> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
>> >> > mantissae
>> >> > are representable using the 64-bit IEEE standard for float
>> >> > representation
>> >> > (if that makes sense);
>> >>
>> >> IEEE-754 says nothing about decimal representations aside from how to
>> >> round when converting to and from strings. You have to provide/accept
>> >> *at least* 9 decimal digits in the significand for single-precision
>> >> and 17 for double-precision (section 5.6). AFAIK implementations will
>> >> vary in how they handle cases where a binary significand would yield
>> >> more digits than that.
>> >>
>> >
>> > I believe that was the argument for the extended precision formats. The
>> > givien number of decimal digits is sufficient to recover the same float
>> > that
>> > produced them if a slightly higher precision is used in the conversion.
>> >
>> > Chuck
>>
>> >From the discussion for the floating point representation, it seems that
>> a uniform random number generator would have a very coarse grid
>> in the range for example -1e30 to +1e30 compared to interval -0.5,0.5.
>>
>> How many points can be represented by a float in [-0.5,0.5] compared
>> to [1e30, 1e30+1.]?
>> If I interpret this correctly, then there are as many floating point
>> numbers
>> in [0,1] as in [1,inf), or am I misinterpreting this.
>>
>> So how does a PRNG handle a huge interval of uniform numbers?
>>
>
> There are several implementations, but the ones I'm familiar with reduce to
> scaling. If the rng produces random unsigned integers, then the range of
> integers is scaled to the interval [0,1). The variations involve explicit
> scaling (portable) or bit twiddling of the IEEE formats. In straight forward
> scaling some ranges of the random integers may not map 1-1, so the unused
> bits are masked off first; if you want doubles you only need 52 bits, etc.
> For bit twiddling there is an implicit 1 in the mantissa, so the basic range
> works out to [1,2), but that can be fixed by subtracting 1 from the result.
> Handling larger ranges than [0,1) just involves another scaling.

So, since this is then a discrete distribution, what is the number of points
in the support? (My guess would be 2**52, but I don't know much about
numerical representations.)

This would be the largest set of integers that could be generated
without gaps in the distribution, and would determine the grid size
for floating point random variables. (?)

for Davids example:
low, high = -1e307, 1e307
np.random.uniform(low, high, 100) # much more reasonable

this would imply a grid size of
>>> 2*1e307/2.**52
4.4408920985006261e+291

or something similar. (floating points are not very dense in the real line.)

Josef


>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread David Cournapeau
On Fri, Nov 6, 2009 at 6:54 AM, David Goldsmith  wrote:
> Interesting thread, which leaves me wondering two things: is it documented
> somewhere (e.g., at the IEEE site) precisely how many *decimal* mantissae
> are representable using the 64-bit IEEE standard for float representation
> (if that makes sense); and are such decimal mantissae uniformly distributed

They are definitely not uniformly distributed: that's why two numbers
are close around 1 when they have only a few EPS difference, but
around 1e100, you have to add quite a few EPS to even get a different
number at all.

That may be my audio processing background, but I like to think about
float as numbers which have the same relative precision at any "level"
- a kind of dB scale. If you want numbers with a fixed number of
decimals, you need a fixed point representation.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread Charles R Harris
On Thu, Nov 5, 2009 at 7:53 PM,  wrote:

> On Thu, Nov 5, 2009 at 9:23 PM, Charles R Harris
>  wrote:
> >
> >
> > On Thu, Nov 5, 2009 at 7:04 PM,  wrote:
> >>
> >> On Thu, Nov 5, 2009 at 6:36 PM, Charles R Harris
> >>  wrote:
> >> >
> >> >
> >> > On Thu, Nov 5, 2009 at 4:26 PM, David Warde-Farley <
> d...@cs.toronto.edu>
> >> > wrote:
> >> >>
> >> >> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
> >> >>
> >> >> > Interesting thread, which leaves me wondering two things: is it
> >> >> > documented
> >> >> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
> >> >> > mantissae
> >> >> > are representable using the 64-bit IEEE standard for float
> >> >> > representation
> >> >> > (if that makes sense);
> >> >>
> >> >> IEEE-754 says nothing about decimal representations aside from how to
> >> >> round when converting to and from strings. You have to provide/accept
> >> >> *at least* 9 decimal digits in the significand for single-precision
> >> >> and 17 for double-precision (section 5.6). AFAIK implementations will
> >> >> vary in how they handle cases where a binary significand would yield
> >> >> more digits than that.
> >> >>
> >> >
> >> > I believe that was the argument for the extended precision formats.
> The
> >> > givien number of decimal digits is sufficient to recover the same
> float
> >> > that
> >> > produced them if a slightly higher precision is used in the
> conversion.
> >> >
> >> > Chuck
> >>
> >> >From the discussion for the floating point representation, it seems
> that
> >> a uniform random number generator would have a very coarse grid
> >> in the range for example -1e30 to +1e30 compared to interval -0.5,0.5.
> >>
> >> How many points can be represented by a float in [-0.5,0.5] compared
> >> to [1e30, 1e30+1.]?
> >> If I interpret this correctly, then there are as many floating point
> >> numbers
> >> in [0,1] as in [1,inf), or am I misinterpreting this.
> >>
> >> So how does a PRNG handle a huge interval of uniform numbers?
> >>
> >
> > There are several implementations, but the ones I'm familiar with reduce
> to
> > scaling. If the rng produces random unsigned integers, then the range of
> > integers is scaled to the interval [0,1). The variations involve explicit
> > scaling (portable) or bit twiddling of the IEEE formats. In straight
> forward
> > scaling some ranges of the random integers may not map 1-1, so the unused
> > bits are masked off first; if you want doubles you only need 52 bits,
> etc.
> > For bit twiddling there is an implicit 1 in the mantissa, so the basic
> range
> > works out to [1,2), but that can be fixed by subtracting 1 from the
> result.
> > Handling larger ranges than [0,1) just involves another scaling.
>
> So, since this is then a discrete distribution, what is the number of
> points
> in the support? (My guess would be 2**52, but I don't know much about
> numerical representations.)
>
> This would be the largest set of integers that could be generated
> without gaps in the distribution, and would determine the grid size
> for floating point random variables. (?)
>
>
Yes.


> for Davids example:
> low, high = -1e307, 1e307
> np.random.uniform(low, high, 100) # much more reasonable
>
> this would imply a grid size of
> >>> 2*1e307/2.**52
> 4.4408920985006261e+291
>
> or something similar. (floating points are not very dense in the real
> line.)
>
>
Yes, or rather, they are more or less logarithmically distributed. Floats
are basically logarithms base 2 with a mantissa of fixed precision. Actual
logarithm code uses the exponent and does a correction to the mantissa,
i.e., takes the logarithm base two of numbers in the range [1,2).

Random integers are treated differently. To get random integers in a given
range, say [0,100], the bitstream produced by the rng would be broken up
into 7 bit chunks [0, 127] that are used in a sampling algorithm that loops
until a value in the range is produced. So any range of integers can be
produced. Code for that comes with the Mersenne Twister, but I don't know if
numpy uses it.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread David Goldsmith
On Thu, Nov 5, 2009 at 3:26 PM, David Warde-Farley wrote:

> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
>
> > Interesting thread, which leaves me wondering two things: is it
> > documented
> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
> > mantissae
> > are representable using the 64-bit IEEE standard for float
> > representation
> > (if that makes sense);
>
> IEEE-754 says nothing about decimal representations aside from how to
> round when converting to and from strings. You have to provide/accept
> *at least* 9 decimal digits in the significand for single-precision
> and 17 for double-precision (section 5.6). AFAIK implementations will
> vary in how they handle cases where a binary significand would yield
> more digits than that.
>

I was actually more interested in the opposite situation, where the decimal
representation (which is what a user would most likely provide) doesn't have
a finite binary expansion: what happens then, something analogous to the
decimal "rule of fives"?

DG
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread Anne Archibald
2009/11/5 David Goldsmith :
> On Thu, Nov 5, 2009 at 3:26 PM, David Warde-Farley 
> wrote:
>>
>> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
>>
>> > Interesting thread, which leaves me wondering two things: is it
>> > documented
>> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
>> > mantissae
>> > are representable using the 64-bit IEEE standard for float
>> > representation
>> > (if that makes sense);
>>
>> IEEE-754 says nothing about decimal representations aside from how to
>> round when converting to and from strings. You have to provide/accept
>> *at least* 9 decimal digits in the significand for single-precision
>> and 17 for double-precision (section 5.6). AFAIK implementations will
>> vary in how they handle cases where a binary significand would yield
>> more digits than that.
>
> I was actually more interested in the opposite situation, where the decimal
> representation (which is what a user would most likely provide) doesn't have
> a finite binary expansion: what happens then, something analogous to the
> decimal "rule of fives"?

If you interpret "0.1" as 1/10, then this is a very general
floating-point issue: how you you round off numbers you can't
represent exactly? The usual answer (leaving aside internal
extended-precision shenanigans) is to round, with the rule that when
you're exactly between two floating-point numbers you round to the one
that's even, rather than always up or always down (the numerical
analysis wizards tell us that this is more numerically stable).

Anne

> DG
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread josef . pktd
On Thu, Nov 5, 2009 at 10:42 PM, David Goldsmith
 wrote:
> On Thu, Nov 5, 2009 at 3:26 PM, David Warde-Farley 
> wrote:
>>
>> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
>>
>> > Interesting thread, which leaves me wondering two things: is it
>> > documented
>> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
>> > mantissae
>> > are representable using the 64-bit IEEE standard for float
>> > representation
>> > (if that makes sense);
>>
>> IEEE-754 says nothing about decimal representations aside from how to
>> round when converting to and from strings. You have to provide/accept
>> *at least* 9 decimal digits in the significand for single-precision
>> and 17 for double-precision (section 5.6). AFAIK implementations will
>> vary in how they handle cases where a binary significand would yield
>> more digits than that.
>
> I was actually more interested in the opposite situation, where the decimal
> representation (which is what a user would most likely provide) doesn't have
> a finite binary expansion: what happens then, something analogous to the
> decimal "rule of fives"?

Since according to my calculations there are only about

>>> 4* 10**17 * 308
12320L

double-precision floats, there are huge gaps in the floating point
representation of the real line.
Any user input or calculation result just gets converted to the
closest float.

Josef


>
> DG
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread josef . pktd
On Thu, Nov 5, 2009 at 10:28 PM, Charles R Harris
 wrote:
>
>
> On Thu, Nov 5, 2009 at 7:53 PM,  wrote:
>>
>> On Thu, Nov 5, 2009 at 9:23 PM, Charles R Harris
>>  wrote:
>> >
>> >
>> > On Thu, Nov 5, 2009 at 7:04 PM,  wrote:
>> >>
>> >> On Thu, Nov 5, 2009 at 6:36 PM, Charles R Harris
>> >>  wrote:
>> >> >
>> >> >
>> >> > On Thu, Nov 5, 2009 at 4:26 PM, David Warde-Farley
>> >> > 
>> >> > wrote:
>> >> >>
>> >> >> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
>> >> >>
>> >> >> > Interesting thread, which leaves me wondering two things: is it
>> >> >> > documented
>> >> >> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
>> >> >> > mantissae
>> >> >> > are representable using the 64-bit IEEE standard for float
>> >> >> > representation
>> >> >> > (if that makes sense);
>> >> >>
>> >> >> IEEE-754 says nothing about decimal representations aside from how
>> >> >> to
>> >> >> round when converting to and from strings. You have to
>> >> >> provide/accept
>> >> >> *at least* 9 decimal digits in the significand for single-precision
>> >> >> and 17 for double-precision (section 5.6). AFAIK implementations
>> >> >> will
>> >> >> vary in how they handle cases where a binary significand would yield
>> >> >> more digits than that.
>> >> >>
>> >> >
>> >> > I believe that was the argument for the extended precision formats.
>> >> > The
>> >> > givien number of decimal digits is sufficient to recover the same
>> >> > float
>> >> > that
>> >> > produced them if a slightly higher precision is used in the
>> >> > conversion.
>> >> >
>> >> > Chuck
>> >>
>> >> >From the discussion for the floating point representation, it seems
>> >> that
>> >> a uniform random number generator would have a very coarse grid
>> >> in the range for example -1e30 to +1e30 compared to interval -0.5,0.5.
>> >>
>> >> How many points can be represented by a float in [-0.5,0.5] compared
>> >> to [1e30, 1e30+1.]?
>> >> If I interpret this correctly, then there are as many floating point
>> >> numbers
>> >> in [0,1] as in [1,inf), or am I misinterpreting this.
>> >>
>> >> So how does a PRNG handle a huge interval of uniform numbers?
>> >>
>> >
>> > There are several implementations, but the ones I'm familiar with reduce
>> > to
>> > scaling. If the rng produces random unsigned integers, then the range of
>> > integers is scaled to the interval [0,1). The variations involve
>> > explicit
>> > scaling (portable) or bit twiddling of the IEEE formats. In straight
>> > forward
>> > scaling some ranges of the random integers may not map 1-1, so the
>> > unused
>> > bits are masked off first; if you want doubles you only need 52 bits,
>> > etc.
>> > For bit twiddling there is an implicit 1 in the mantissa, so the basic
>> > range
>> > works out to [1,2), but that can be fixed by subtracting 1 from the
>> > result.
>> > Handling larger ranges than [0,1) just involves another scaling.
>>
>> So, since this is then a discrete distribution, what is the number of
>> points
>> in the support? (My guess would be 2**52, but I don't know much about
>> numerical representations.)
>>
>> This would be the largest set of integers that could be generated
>> without gaps in the distribution, and would determine the grid size
>> for floating point random variables. (?)
>>
>
> Yes.
>
>>
>> for Davids example:
>> low, high = -1e307, 1e307
>> np.random.uniform(low, high, 100) # much more reasonable
>>
>> this would imply a grid size of
>> >>> 2*1e307/2.**52
>> 4.4408920985006261e+291
>>
>> or something similar. (floating points are not very dense in the real
>> line.)
>>
>
> Yes, or rather, they are more or less logarithmically distributed. Floats
> are basically logarithms base 2 with a mantissa of fixed precision. Actual
> logarithm code uses the exponent and does a correction to the mantissa,
> i.e., takes the logarithm base two of numbers in the range [1,2).
>
> Random integers are treated differently. To get random integers in a given
> range, say [0,100], the bitstream produced by the rng would be broken up
> into 7 bit chunks [0, 127] that are used in a sampling algorithm that loops
> until a value in the range is produced. So any range of integers can be
> produced. Code for that comes with the Mersenne Twister, but I don't know if
> numpy uses it.
>
> Chuck

Thanks for the explanation.

It's good to understand some of the limitations, but I don't think in any
Monte Carlo, I will have a distribution with tails that are fat enough
that the discretization becomes a problem.

Josef


>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread Anne Archibald
2009/11/5  :
> On Thu, Nov 5, 2009 at 10:42 PM, David Goldsmith
>  wrote:
>> On Thu, Nov 5, 2009 at 3:26 PM, David Warde-Farley 
>> wrote:
>>>
>>> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
>>>
>>> > Interesting thread, which leaves me wondering two things: is it
>>> > documented
>>> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
>>> > mantissae
>>> > are representable using the 64-bit IEEE standard for float
>>> > representation
>>> > (if that makes sense);
>>>
>>> IEEE-754 says nothing about decimal representations aside from how to
>>> round when converting to and from strings. You have to provide/accept
>>> *at least* 9 decimal digits in the significand for single-precision
>>> and 17 for double-precision (section 5.6). AFAIK implementations will
>>> vary in how they handle cases where a binary significand would yield
>>> more digits than that.
>>
>> I was actually more interested in the opposite situation, where the decimal
>> representation (which is what a user would most likely provide) doesn't have
>> a finite binary expansion: what happens then, something analogous to the
>> decimal "rule of fives"?
>
> Since according to my calculations there are only about
>
 4* 10**17 * 308
> 12320L

More straightforwardly, it's not too far below 2**64.

> double-precision floats, there are huge gaps in the floating point
> representation of the real line.
> Any user input or calculation result just gets converted to the
> closest float.

Yes. But the "huge" gaps are only huge in absolute size; in fractional
error they're always about the same size. They are usually only an
issue if you're representing numbers in a small range with a huge
offset. To take a not-so-random example, you could be representing
times in days since November 17 1858, when what you care about are the
microsecond-scale differences in photon arrival times. Even then
you're probably okay as long as you compute directly (t[i] =
i*dt/86400 + start_t) rather than having some sort of running
accumulator (t[i]=t; t += dt/86400).

Professor Kahan's reasoning for using doubles for most everything is
that they generally have so much more precision than you actually need
that you can get away with being sloppy.


Anne

> Josef
>
>
>>
>> DG
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread Charles R Harris
On Thu, Nov 5, 2009 at 9:14 PM,  wrote:

> On Thu, Nov 5, 2009 at 10:42 PM, David Goldsmith
>  wrote:
> > On Thu, Nov 5, 2009 at 3:26 PM, David Warde-Farley 
> > wrote:
> >>
> >> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
> >>
> >> > Interesting thread, which leaves me wondering two things: is it
> >> > documented
> >> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
> >> > mantissae
> >> > are representable using the 64-bit IEEE standard for float
> >> > representation
> >> > (if that makes sense);
> >>
> >> IEEE-754 says nothing about decimal representations aside from how to
> >> round when converting to and from strings. You have to provide/accept
> >> *at least* 9 decimal digits in the significand for single-precision
> >> and 17 for double-precision (section 5.6). AFAIK implementations will
> >> vary in how they handle cases where a binary significand would yield
> >> more digits than that.
> >
> > I was actually more interested in the opposite situation, where the
> decimal
> > representation (which is what a user would most likely provide) doesn't
> have
> > a finite binary expansion: what happens then, something analogous to the
> > decimal "rule of fives"?
>
> Since according to my calculations there are only about
>
> >>> 4* 10**17 * 308
> 12320L
>
> double-precision floats, there are huge gaps in the floating point
> representation of the real line.
>

2**64 minus some flags for nans and such.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-05 Thread David Goldsmith
On Thu, Nov 5, 2009 at 8:07 PM, Anne Archibald wrote:

> 2009/11/5 David Goldsmith :
> > On Thu, Nov 5, 2009 at 3:26 PM, David Warde-Farley 
> > wrote:
> >>
> >> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
> >>
> >> > Interesting thread, which leaves me wondering two things: is it
> >> > documented
> >> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
> >> > mantissae
> >> > are representable using the 64-bit IEEE standard for float
> >> > representation
> >> > (if that makes sense);
> >>
> >> IEEE-754 says nothing about decimal representations aside from how to
> >> round when converting to and from strings. You have to provide/accept
> >> *at least* 9 decimal digits in the significand for single-precision
> >> and 17 for double-precision (section 5.6). AFAIK implementations will
> >> vary in how they handle cases where a binary significand would yield
> >> more digits than that.
> >
> > I was actually more interested in the opposite situation, where the
> decimal
> > representation (which is what a user would most likely provide) doesn't
> have
> > a finite binary expansion: what happens then, something analogous to the
> > decimal "rule of fives"?
>
> If you interpret "0.1" as 1/10, then this is a very general
> floating-point issue: how you you round off numbers you can't
> represent exactly? The usual answer (leaving aside internal
> extended-precision shenanigans) is to round, with the rule that when
> you're exactly between two floating-point numbers you round to the one
> that's even, rather than always up or always down (the numerical
> analysis wizards tell us that this is more numerically stable).
>

More numerically stable, or simply unbiased, since half the time your
rounding up, and the other half rounding down (that's the rationale as it
was explained to me).

DG

>
> Anne
>
> > DG
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-07 Thread Sturla Molden
David Cournapeau wrote:
> On Fri, Nov 6, 2009 at 6:54 AM, David Goldsmith  
> wrote:
>   
>> Interesting thread, which leaves me wondering two things: is it documented
>> somewhere (e.g., at the IEEE site) precisely how many *decimal* mantissae
>> are representable using the 64-bit IEEE standard for float representation
>> (if that makes sense); and are such decimal mantissae uniformly distributed
>> 
>
> They are definitely not uniformly distributed: that's why two numbers
> are close around 1 when they have only a few EPS difference, but
> around 1e100, you have to add quite a few EPS to even get a different
> number at all.
>
> That may be my audio processing background, but I like to think about
> float as numbers which have the same relative precision at any "level"
> - a kind of dB scale. If you want numbers with a fixed number of
> decimals, you need a fixed point representation.
>   

David Godsmith was asking about the mantissae. For a double, that is a 
53 bit signed integer. I.e. you have 52 bit fractional part (bit 0-51), 
11 bit exponent (bit 52-62), and one sign bit (bit 63). The mantissae is 
uniformly distributed like any signed integer. The mantissae of a double 
have 2**53 different integer values: -2**52 to 2**52-1.

But the value of a floating point number is

   value = (-1)**signbit * 2**(exponent - bias) * (1 - fraction)

with bias = 1023 for a double. Thus, floating point numbers are not 
uniformly distributed, but the mantissae is.

For numerical illiterates this might come as a surprise. But in 
numerical mathematics, the resolution is in the number of "significant 
digits", not in "the number of decimals". 101 and .00201 have the same 
numerical precision.

A decimal, on the other hand, can be thought of as a floating point 
number using base-10 instead of base-2 for the exponent:

   value = (-1)**signbit * 10**(exponent - bias) * (1 - fraction)

Decimals and floats are not fundamentally different. There are number 
exactly representable with a decimal that cannot be exactly represented 
with a float. But numerical computation do not become more precise with 
a decimal than a float.

















___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-08 Thread David Goldsmith
Thanks, Sturla, you've confirmed what I thought was the case (and explained
more thoroughly the answer others gave more succinctly, but also more
opaquely). :-)

DG

On Sat, Nov 7, 2009 at 11:40 AM, Sturla Molden  wrote:

> David Cournapeau wrote:
> > On Fri, Nov 6, 2009 at 6:54 AM, David Goldsmith 
> wrote:
> >
> >> Interesting thread, which leaves me wondering two things: is it
> documented
> >> somewhere (e.g., at the IEEE site) precisely how many *decimal*
> mantissae
> >> are representable using the 64-bit IEEE standard for float
> representation
> >> (if that makes sense); and are such decimal mantissae uniformly
> distributed
> >>
> >
> > They are definitely not uniformly distributed: that's why two numbers
> > are close around 1 when they have only a few EPS difference, but
> > around 1e100, you have to add quite a few EPS to even get a different
> > number at all.
> >
> > That may be my audio processing background, but I like to think about
> > float as numbers which have the same relative precision at any "level"
> > - a kind of dB scale. If you want numbers with a fixed number of
> > decimals, you need a fixed point representation.
> >
>
> David Godsmith was asking about the mantissae. For a double, that is a
> 53 bit signed integer. I.e. you have 52 bit fractional part (bit 0-51),
> 11 bit exponent (bit 52-62), and one sign bit (bit 63). The mantissae is
> uniformly distributed like any signed integer. The mantissae of a double
> have 2**53 different integer values: -2**52 to 2**52-1.
>
> But the value of a floating point number is
>
>   value = (-1)**signbit * 2**(exponent - bias) * (1 - fraction)
>
> with bias = 1023 for a double. Thus, floating point numbers are not
> uniformly distributed, but the mantissae is.
>
> For numerical illiterates this might come as a surprise. But in
> numerical mathematics, the resolution is in the number of "significant
> digits", not in "the number of decimals". 101 and .00201 have the same
> numerical precision.
>
> A decimal, on the other hand, can be thought of as a floating point
> number using base-10 instead of base-2 for the exponent:
>
>   value = (-1)**signbit * 10**(exponent - bias) * (1 - fraction)
>
> Decimals and floats are not fundamentally different. There are number
> exactly representable with a decimal that cannot be exactly represented
> with a float. But numerical computation do not become more precise with
> a decimal than a float.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion