Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 06:32, Sturla Molden wrote:

> The use of C long affects all the C and Pyrex source code in mtrand
> module, not just mtrand.pyx. All of it is fubar on Win64.

randomkit.c handles C long correctly, I think. There are different codes 
for 32 and 64 bit C long, and buffer sizes are size_t.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 09:21, Sturla Molden wrote:

> randomkit.c handles C long correctly, I think. There are different codes
> for 32 and 64 bit C long, and buffer sizes are size_t.

distributions.c take C longs as parameters e.g. for the binomial 
distribution. mtrand.pyx correctly handles this, but it can give an 
unexpected overflow error on 64-bit Windows:


In [1]: np.random.binomial(2**31, .5)
---
OverflowError Traceback (most recent call last)
C:\Windows\system32\ in ()
> 1 np.random.binomial(2**31, .5)

C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in 
mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)()

OverflowError: Python int too large to convert to C long


On systems where C longs are 64 bit, this is likely not to produce an 
error.

This begs the question if also randomkit.c and districutions.c should be 
changed to use npy_intp for consistency across all platforms.

(I assume we are not supporting 16 bit NumPy, in which case we will need 
C long there...)


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 06:32, Sturla Molden wrote:
> Den 24.01.2012 06:00, skrev Sturla Molden:
>> Both i and length could overflow here. It should overflow on
>> allocation of more than 2 GB. There is also a lot of C longs in the
>> internal state (line 55-105), as well as the other functions.
>
> The use of C long affects all the C and Pyrex source code in mtrand
> module, not just mtrand.pyx. All of it is fubar on Win64.


The coding is also inconsistent, compare for example:

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201



Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 08:37, Sturla Molden  wrote:
> On 24.01.2012 09:21, Sturla Molden wrote:
>
>> randomkit.c handles C long correctly, I think. There are different codes
>> for 32 and 64 bit C long, and buffer sizes are size_t.
>
> distributions.c take C longs as parameters e.g. for the binomial
> distribution. mtrand.pyx correctly handles this, but it can give an
> unexpected overflow error on 64-bit Windows:
>
>
> In [1]: np.random.binomial(2**31, .5)
> ---
> OverflowError                             Traceback (most recent call last)
> C:\Windows\system32\ in ()
> > 1 np.random.binomial(2**31, .5)
>
> C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in
> mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)()
>
> OverflowError: Python int too large to convert to C long
>
>
> On systems where C longs are 64 bit, this is likely not to produce an
> error.
>
> This begs the question if also randomkit.c and districutions.c should be
> changed to use npy_intp for consistency across all platforms.

There are two different uses of long that you need to distinguish. One
is for sizes, and one is for parameters and values. The sizes should
definitely be upgraded to npy_intp. The latter shouldn't; these should
remain as the default integer type of Python and numpy, a C long.

The reason longs are used for sizes is that I wrote mtrand for Numeric
and Python 2.4 before numpy was even announced (and I don't think we
had npy_intp at the time I merged it into numpy, but I could be
wrong). Using longs for sizes was the order of the day. I don't think
I had even touched a 64-bit machine that wasn't a DEC Alpha at the
time, so I knew very little about the issues.

So yes, please, fix whatever you can.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 08:47, Sturla Molden  wrote:

> The coding is also inconsistent, compare for example:
>
> https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180
>
> https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201

I'm sorry, what are you demonstrating there?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 10:16, Robert Kern wrote:

> I'm sorry, what are you demonstrating there?

Both npy_intp and C long are used for sizes and indexing.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 09:19, Sturla Molden  wrote:
> On 24.01.2012 10:16, Robert Kern wrote:
>
>> I'm sorry, what are you demonstrating there?
>
> Both npy_intp and C long are used for sizes and indexing.

Ah, yes. I think Travis added the multiiter code to cont1_array(),
which does broadcasting, so he used npy_intp as is proper (and
necessary to pass into the multiiter API). The other functions don't
do broadcasting, so he didn't touch them.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 10:15, Robert Kern wrote:

> There are two different uses of long that you need to distinguish. One
> is for sizes, and one is for parameters and values. The sizes should
> definitely be upgraded to npy_intp. The latter shouldn't; these should
> remain as the default integer type of Python and numpy, a C long.

Ok, that makes sence.

> The reason longs are used for sizes is that I wrote mtrand for Numeric
> and Python 2.4 before numpy was even announced (and I don't think we
> had npy_intp at the time I merged it into numpy, but I could be
> wrong). Using longs for sizes was the order of the day. I don't think
> I had even touched a 64-bit machine that wasn't a DEC Alpha at the
> time, so I knew very little about the issues.


On amd64 the "native" datatypes are actually a 64 bit pointer with a 32 
bit offset (contrary to what we see in Python and NumPy C sources), 
which is one reason why C longs are still 32 bits in MSVC. Thus an array 
size (size_t) should be 64 bits, but array indices (C long) should be 32 
bits. But nobody likes to code like that (e.g. we would need an extra 64 
bit pointer as cursor if the buffer size overflows a C long), and I 
don't think using a non-native 64-bit offset incur a lot of extra 
overhead for the CPU.

:-)

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Strange error raised by scipy.special.erf

2012-01-24 Thread Pierre Haessig
Le 22/01/2012 11:28, Nadav Horesh a écrit :
> >>> special.erf(26.5)
> 1.0
> >>> special.erf(26.6)
> Traceback (most recent call last):
>   File "", line 1, in 
> special.erf(26.6)
> FloatingPointError: underflow encountered in erf
> >>> special.erf(26.7)
> 1.0
>  
I can confirm this same behaviour with numpy 1.5.1/scipy 0.9.0
Indeed 26.5 and 26.7 works, while 26.6 raises the underflow... weird
enough !
-- 
Pierre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] einsum evaluation order

2012-01-24 Thread Søren Gammelmark
Dear all,

I was just looking into numpy.einsum and encountered an issue which might
be worth pointing out in the documentation.

Let us say you wish to evaluate something like this (repeated indices a
summed)

D[alpha, alphaprime] = A[alpha, beta, sigma] * B[alphaprime, betaprime,
sigma] * C[beta, betaprime]

with einsum as

einsum('abs,cds,bd->ac', A, B, C)

then it is not exactly clear which order einsum evaluates the contractions
(or if it does it in one go). This can be very important since you can do
it in several ways, one of which has the least computational complexity.
The most efficient way of doing it is to contract e.g. A and C and then
contract that with B (or exchange A and B). A quick test on my labtop says
2.6 s with einsum and 0.13 s for two tensordots with A and B being D x D x
2 and C is D x D for D = 96. This scaling seems to explode for higher
dimensions, whereas it is much better with the two independent contractions
(i believe it should be O(D^3)).For D = 512 I could do it in 5 s with two
contractions, whereas I stopped waiting after 60 s for einsum (i guess
einsum probably is O(D^4) in this case).

I had in fact thought of making a function similar to einsum for a while,
but after I noticed it dropped it. I think, however, that there might still
be room for a tool for evaluating more complicated expressions efficiently.
I think the best way would be for the user to enter an expression like the
one above which is then evaluated in the optimal order. I know how to do
this (theoretically) if all the repeated indices only occur twice (like the
expression above), but for the more general expression supported by einsum
I om not sure how to do it (haven't thought about it). Here I am thinking
about stuff like x[i] = a[i] * b[i] and their more general counterparts (at
first glance this seems to be a simpler problem than full contractions). Do
you think there is a need/interest for this kind of thing? In that case I
would like the write it / help write it. Much of it, I think, can be
reduced to decomposing the expression into existing numpy operations (e.g.
tensordot). How to incorporate issues of storage layout etc, however, I
have no idea.

In any case I think it might be nice to write explicitly how the expression
in einsum is evaluated in the docs.

Søren Gammelmark
PhD-student
Department of Physics and Astronomy
Aarhus University
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Unexpected behavior with np.min_scalar_type

2012-01-24 Thread Kathleen M Tacina
I was experimenting with np.min_scalar_type to make sure it worked as
expected, and found some unexpected results for integers between 2**63
and 2**64-1.  I would have expected np.min_scalar_type(2**64-1) to
return uint64.  Instead, I get object.  Further experimenting showed
that the largest integer for which np.min_scalar_type will return uint64
is 2**63-1.  Is this expected behavior?

On python 2.7.2 on a 64-bit linux machine:
>>> import numpy as np
>>> np.version.full_version
'2.0.0.dev-55472ca'
>>> np.min_scalar_type(2**8-1)
dtype('uint8')
>>> np.min_scalar_type(2**16-1)
dtype('uint16')
>>> np.min_scalar_type(2**32-1)
dtype('uint32')
>>> np.min_scalar_type(2**64-1)
dtype('O')
>>> np.min_scalar_type(2**63-1)
dtype('uint64')
>>> np.min_scalar_type(2**63)
dtype('O')

I get the same results on a Windows XP  machine running python 2.7.2 and
numpy 1.6.1. 

Kathy  
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Strange error raised by scipy.special.erf

2012-01-24 Thread Nadav Horesh
I filed a ticket (#1590).

 Thank you for the verification.

   Nadav.

From: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] 
On Behalf Of Pierre Haessig [pierre.haes...@crans.org]
Sent: 24 January 2012 16:01
To: numpy-discussion@scipy.org
Subject: Re: [Numpy-discussion] Strange error raised by scipy.special.erf

Le 22/01/2012 11:28, Nadav Horesh a écrit :
> >>> special.erf(26.5)
> 1.0
> >>> special.erf(26.6)
> Traceback (most recent call last):
>   File "", line 1, in 
> special.erf(26.6)
> FloatingPointError: underflow encountered in erf
> >>> special.erf(26.7)
> 1.0
>
I can confirm this same behaviour with numpy 1.5.1/scipy 0.9.0
Indeed 26.5 and 26.7 works, while 26.6 raises the underflow... weird
enough !
--
Pierre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 09:15:01AM +, Robert Kern wrote:
> On Tue, Jan 24, 2012 at 08:37, Sturla Molden  wrote:
> > On 24.01.2012 09:21, Sturla Molden wrote:
> >
> >> randomkit.c handles C long correctly, I think. There are different codes
> >> for 32 and 64 bit C long, and buffer sizes are size_t.
> >
> > distributions.c take C longs as parameters e.g. for the binomial
> > distribution. mtrand.pyx correctly handles this, but it can give an
> > unexpected overflow error on 64-bit Windows:
> >
> >
> > In [1]: np.random.binomial(2**31, .5)
> > ---
> > OverflowError                             Traceback (most recent call last)
> > C:\Windows\system32\ in ()
> > > 1 np.random.binomial(2**31, .5)
> >
> > C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in
> > mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)()
> >
> > OverflowError: Python int too large to convert to C long
> >
> >
> > On systems where C longs are 64 bit, this is likely not to produce an
> > error.
> >
> > This begs the question if also randomkit.c and districutions.c should be
> > changed to use npy_intp for consistency across all platforms.
> 
> There are two different uses of long that you need to distinguish. One
> is for sizes, and one is for parameters and values. The sizes should
> definitely be upgraded to npy_intp. The latter shouldn't; these should
> remain as the default integer type of Python and numpy, a C long.

Hmm. Seeing as the width of a C long is inconsistent, does this imply that
the random number generator will produce different results on different
platforms? Or do the state dynamics prevent it from ever growing in magnitude
to the point where that's an issue?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote:
> Den 23.01.2012 22:08, skrev Christoph Gohlke:
> >
> > Maybe this explains the win-amd64 behavior: There are a couple of places
> > in mtrand where array indices and sizes are C long instead of npy_intp,
> > for example in the randint function:
> >
> > 
> >
> >
> 
> Both i and length could overflow here. It should overflow on allocation 
> of more than 2 GB.
> 
> There is also a lot of C longs in the internal state (line 55-105), as 
> well as the other functions.
> 
> Producing 2 GB of random ints twice fails:

Sturla, since you seem to have access to Win64 machines, do you suppose you
could try this code:

>>> a = numpy.ones((1, 972))
>>> b = numpy.zeros((4993210,), dtype=int)
>>> c = a[b]

and verify that there's a whole lot of 0s in the matrix, specifically,

>>> c[574519:].sum()
356.0
>>> c[574520:].sum()
0.0

is the case on Linux 64-bit; is it the case on Windows 64?

Thanks a lot,

David

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robin
On Tue, Jan 24, 2012 at 6:24 PM, David Warde-Farley
 wrote:
> On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote:
>> Den 23.01.2012 22:08, skrev Christoph Gohlke:
>> >
>> > Maybe this explains the win-amd64 behavior: There are a couple of places
>> > in mtrand where array indices and sizes are C long instead of npy_intp,
>> > for example in the randint function:
>> >
>> > 
>> >
>> >
>>
>> Both i and length could overflow here. It should overflow on allocation
>> of more than 2 GB.
>>
>> There is also a lot of C longs in the internal state (line 55-105), as
>> well as the other functions.
>>
>> Producing 2 GB of random ints twice fails:
>
> Sturla, since you seem to have access to Win64 machines, do you suppose you
> could try this code:
>
 a = numpy.ones((1, 972))
 b = numpy.zeros((4993210,), dtype=int)
 c = a[b]
>
> and verify that there's a whole lot of 0s in the matrix, specifically,
>
 c[574519:].sum()
> 356.0
 c[574520:].sum()
> 0.0
>
> is the case on Linux 64-bit; is it the case on Windows 64?

Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.

Cheers

Robin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote:

> Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.

Alright, so that rules out platform specific effects.

I'll try and hunt the bug down when I have some time, if someone more
familiar with the indexing code doesn't beat me to it.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread K . -Michael Aye
I know I know, that's pretty outrageous to even suggest, but please 
bear with me, I am stumped as you may be:

2-D data file here:
http://dl.dropbox.com/u/139035/data.npy

Then:
In [3]: data.mean()
Out[3]: 3067.024383998

In [4]: data.max()
Out[4]: 3052.4343

In [5]: data.shape
Out[5]: (1000, 1000)

In [6]: data.min()
Out[6]: 3040.498

In [7]: data.dtype
Out[7]: dtype('float32')


A mean value calculated per loop over the data gives me 3045.747251076416
I first thought I still misunderstand how data.mean() works, per axis 
and so on, but did the same with a flattenend version with the same 
results.

Am I really soo tired that I can't see what I am doing wrong here?
For completion, the data was read by a osgeo.gdal dataset method called 
ReadAsArray()
My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
Enthought's EPD.

Best regards,
Michael



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Bruce Southey
On 01/24/2012 12:33 PM, K.-Michael Aye wrote:
> I know I know, that's pretty outrageous to even suggest, but please
> bear with me, I am stumped as you may be:
>
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
>
> Then:
> In [3]: data.mean()
> Out[3]: 3067.024383998
>
> In [4]: data.max()
> Out[4]: 3052.4343
>
> In [5]: data.shape
> Out[5]: (1000, 1000)
>
> In [6]: data.min()
> Out[6]: 3040.498
>
> In [7]: data.dtype
> Out[7]: dtype('float32')
>
>
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis
> and so on, but did the same with a flattenend version with the same
> results.
>
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on
> Enthought's EPD.
>
> Best regards,
> Michael
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
You have a million 32-bit floating point numbers that are in the 
thousands. Thus you are exceeding the 32-bitfloat precision and, if you 
can, you need to increase precision of the accumulator in np.mean() or 
change the input dtype:
 >>> a.mean(dtype=np.float32) # default and lacks precision
3067.024383998
 >>> a.mean(dtype=np.float64)
3045.747251076416
 >>> a.mean(dtype=np.float128)
3045.7472510764160156
 >>> b=a.astype(np.float128)
 >>> b.mean()
3045.7472510764160156

Otherwise you are left to using some alternative approach to calculate 
the mean.

Bruce





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Kathleen M Tacina
I have confirmed this on a 64-bit linux machine running python 2.7.2
with the development version of numpy.  It seems to be related to using
float32 instead of float64.   If the array is first converted to a
64-bit float (via astype), mean gives an answer that agrees with your
looped-calculation value: 3045.747250002.  With the original 32-bit
array, averaging successively on one axis and then on the other gives
answers that agree with the 64-bit float answer to the second decimal
place.


In [125]: d = np.load('data.npy')

In [126]: d.mean()
Out[126]: 3067.024383998

In [127]: d64 = d.astype('float64')

In [128]: d64.mean()
Out[128]: 3045.747251076416

In [129]: d.mean(axis=0).mean()
Out[129]: 3045.748750002

In [130]: d.mean(axis=1).mean()
Out[130]: 3045.74448

In [131]: np.version.full_version
Out[131]: '2.0.0.dev-55472ca'



--
On Tue, 2012-01-24 at 12:33 -0600, K.-MichaelA wrote:

> I know I know, that's pretty outrageous to even suggest, but please 
> bear with me, I am stumped as you may be:
> 
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
> 
> Then:
> In [3]: data.mean()
> Out[3]: 3067.024383998
> 
> In [4]: data.max()
> Out[4]: 3052.4343
> 
> In [5]: data.shape
> Out[5]: (1000, 1000)
> 
> In [6]: data.min()
> Out[6]: 3040.498
> 
> In [7]: data.dtype
> Out[7]: dtype('float32')
> 
> 
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis 
> and so on, but did the same with a flattenend version with the same 
> results.
> 
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called 
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
> Enthought's EPD.
> 
> Best regards,
> Michael
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
--
Kathleen M. Tacina
NASA Glenn Research Center
MS 5-10
21000 Brookpark Road
Cleveland, OH 44135
Telephone: (216) 433-6660
Fax: (216) 433-5802
--
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Zachary Pincus

On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote:

> I know I know, that's pretty outrageous to even suggest, but please 
> bear with me, I am stumped as you may be:
> 
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
> 
> Then:
> In [3]: data.mean()
> Out[3]: 3067.024383998
> 
> In [4]: data.max()
> Out[4]: 3052.4343
> 
> In [5]: data.shape
> Out[5]: (1000, 1000)
> 
> In [6]: data.min()
> Out[6]: 3040.498
> 
> In [7]: data.dtype
> Out[7]: dtype('float32')
> 
> 
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis 
> and so on, but did the same with a flattenend version with the same 
> results.
> 
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called 
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
> Enthought's EPD.


I get the same result:

In [1]: import numpy

In [2]: data = numpy.load('data.npy')

In [3]: data.mean()
Out[3]: 3067.024383998

In [4]: data.max()
Out[4]: 3052.4343

In [5]: data.min()
Out[5]: 3040.498

In [6]: numpy.version.version
Out[6]: '2.0.0.dev-433b02a'

This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 
32 vs. 64-bit process doesn't make a difference.

The data matrix doesn't look too strange when I view it as an image -- all 
pretty smooth variation around the (min, max) range. But maybe it's still 
somehow floating-point pathological?

This is fun too:
In [12]: data.mean()
Out[12]: 3067.024383998

In [13]: (data/3000).mean()*3000
Out[13]: 3020.807437501

In [15]: (data/2).mean()*2
Out[15]: 3067.024383998

In [16]: (data/200).mean()*200
Out[16]: 3013.67541


Zach


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Val Kalatsky
Just what Bruce said.

You can run the following to confirm:
np.mean(data - data.mean())

If for some reason you do not want to convert to float64 you can add the
result of the previous line to the "bad" mean:
bad_mean = data.mean()
good_mean = bad_mean + np.mean(data - bad_mean)

Val

On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye wrote:

> I know I know, that's pretty outrageous to even suggest, but please
> bear with me, I am stumped as you may be:
>
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
>
> Then:
> In [3]: data.mean()
> Out[3]: 3067.024383998
>
> In [4]: data.max()
> Out[4]: 3052.4343
>
> In [5]: data.shape
> Out[5]: (1000, 1000)
>
> In [6]: data.min()
> Out[6]: 3040.498
>
> In [7]: data.dtype
> Out[7]: dtype('float32')
>
>
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis
> and so on, but did the same with a flattenend version with the same
> results.
>
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on
> Enthought's EPD.
>
> Best regards,
> Michael
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Zachary Pincus
> You have a million 32-bit floating point numbers that are in the 
> thousands. Thus you are exceeding the 32-bitfloat precision and, if you 
> can, you need to increase precision of the accumulator in np.mean() or 
> change the input dtype:
 a.mean(dtype=np.float32) # default and lacks precision
> 3067.024383998
 a.mean(dtype=np.float64)
> 3045.747251076416
 a.mean(dtype=np.float128)
> 3045.7472510764160156
 b=a.astype(np.float128)
 b.mean()
> 3045.7472510764160156
> 
> Otherwise you are left to using some alternative approach to calculate 
> the mean.
> 
> Bruce

Interesting -- I knew that float64 accumulators were used with integer arrays, 
and I had just assumed that 64-bit or higher accumulators would be used with 
floating-point arrays too, instead of the array's dtype. This is actually quite 
a bit of a gotcha for floating-point imaging-type tasks -- good to know!

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread K . -Michael Aye
Thank you Bruce and all, 
I knew I was doing something wrong (should have read the mean method 
doc more closely). Am of course glad that's so easy understandable.
But: If the error can get so big, wouldn't it be a better idea for the 
accumulator to always be of type 'float64' and then convert later to 
the type of the original array? 
As one can see in this case, the result would be much closer to the true value.


Michael


On 2012-01-24 19:01:40 +, Val Kalatsky said:



Just what Bruce said. 

You can run the following to confirm:
np.mean(data - data.mean())

If for some reason you do not want to convert to float64 you can add 
the result of the previous line to the "bad" mean:

bad_mean = data.mean()
good_mean = bad_mean + np.mean(data - bad_mean)

Val

On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye 
 wrote:

I know I know, that's pretty outrageous to even suggest, but please
bear with me, I am stumped as you may be:

2-D data file here:
http://dl.dropbox.com/u/139035/data.npy

Then:
In [3]: data.mean()
Out[3]: 3067.024383998

In [4]: data.max()
Out[4]: 3052.4343

In [5]: data.shape
Out[5]: (1000, 1000)

In [6]: data.min()
Out[6]: 3040.498

In [7]: data.dtype
Out[7]: dtype('float32')


A mean value calculated per loop over the data gives me 3045.747251076416
I first thought I still misunderstand how data.mean() works, per axis
and so on, but did the same with a flattenend version with the same
results.

Am I really soo tired that I can't see what I am doing wrong here?
For completion, the data was read by a osgeo.gdal dataset method called
ReadAsArray()
My numpy.__version__ gives me 1.6.1 and my whole setup is based on
Enthought's EPD.

Best regards,
Michael



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Course "Python for Scientists and Engineers" in Chicago

2012-01-24 Thread Mike Müller
Course "Python for Scientists and Engineers" in Chicago
===

There will be a comprehensive Python course for scientists and engineers
in Chicago end of February / beginning of March 2012. It consists of a 3-day
intro and a 2-day advanced section. Both sections can be taken separately
or combined.

More details below and here: http://www.dabeaz.com/chicago/science.html

Please let friends or colleagues who might be interested in such a
course know about it.


3-Day Intro Section
---

- Overview of Scientific and Technical Libraries for Python.
- Numerical Calculations with NumPy
- Storage and Processing of Large Amounts of Data
- Graphical Presentation of Scientific Data with matplotlib
- Object Oriented Programming for Scientific and Technical Projects
- Open Time for Problem Solving


2-Day Advanced Section
--

- Extending Python with Other Languages
- Unit Testing
- Version Control with Mercurial


The Details
---

The course is hosted by David Beazley (http://www.dabeaz.com).

Date: Feb 27 - Mar 2, 2012
Location: Chicago, IL, USA
Trainer: Mike Müller
Course Language: English
Link: http://www.dabeaz.com/chicago/science.html
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote:
> On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote:
> 
> > Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.
> 
> Alright, so that rules out platform specific effects.
> 
> I'll try and hunt the bug down when I have some time, if someone more
> familiar with the indexing code doesn't beat me to it.

I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap
is using an int for a counter variable where it should be using an npy_intp.

I've filed a pull request at https://github.com/numpy/numpy/pull/188 with a
regression test.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Samuel John

On 23.01.2012, at 11:23, David Warde-Farley wrote:
>> a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8')
>> b = numpy.random.randint(500,size=(4993210,))
>> c = a[b]
>> In [14]: c[100:].sum()
>> Out[14]: 0

Same here.

Python 2.7.2, 64bit, Mac OS X (Lion), 8GB RAM, numpy.__version__ = 
2.0.0.dev-55472ca
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)]
Numpy built without llvm.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unexpected behavior with np.min_scalar_type

2012-01-24 Thread Samuel John
I get the same results as you, Kathy.
*surprised*

(On OS X (Lion), 64 bit, numpy 2.0.0.dev-55472ca, Python 2.7.2.


On 24.01.2012, at 16:29, Kathleen M Tacina wrote:

> I was experimenting with np.min_scalar_type to make sure it worked as 
> expected, and found some unexpected results for integers between 2**63 and 
> 2**64-1.  I would have expected np.min_scalar_type(2**64-1) to return uint64. 
>  Instead, I get object.  Further experimenting showed that the largest 
> integer for which np.min_scalar_type will return uint64 is 2**63-1.  Is this 
> expected behavior?
> 
> On python 2.7.2 on a 64-bit linux machine:
> >>> import numpy as np
> >>> np.version.full_version
> '2.0.0.dev-55472ca'
> >>> np.min_scalar_type(2**8-1)
> dtype('uint8')
> >>> np.min_scalar_type(2**16-1)
> dtype('uint16')
> >>> np.min_scalar_type(2**32-1)
> dtype('uint32')
> >>> np.min_scalar_type(2**64-1)
> dtype('O')
> >>> np.min_scalar_type(2**63-1)
> dtype('uint64')
> >>> np.min_scalar_type(2**63)
> dtype('O')
> 
> I get the same results on a Windows XP  machine running python 2.7.2 and 
> numpy 1.6.1. 
> 
> Kathy 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 'Advanced' save and restore operation

2012-01-24 Thread Samuel John
I know you wrote that you want "TEXT" files, but never-the-less, I'd like to 
point to http://code.google.com/p/h5py/ .
There are viewers for hdf5 and it is stable and widely used.

 Samuel


On 24.01.2012, at 00:26, Emmanuel Mayssat wrote:

> After having saved data, I need to know/remember the data dtype to
> restore it correctly.
> Is there a way to save the dtype with the data?
> (I guess the header parameter of savedata could help, but they are
> only available in v2.0+ )
> 
> I would like to save several related structured array and a dictionary
> of parameters into a TEXT file.
> Is there an easy way to do that?
> (maybe xml file, or maybe archive zip file of other files, or . )
> 
> Any recommendation is helpful.
> 
> Regards,
> --
> Emmanuel
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] installing matplotlib in MacOs 10.6.8.

2012-01-24 Thread Samuel John
Sorry for the late answer. But at least for the record:

If you are using eclipse, I assume you have also installed the eclipse plugin 
[pydev](http://pydev.org/). Is use it myself, it's good. 
Then you have to go to the preferences->pydev->PythonInterpreter and select the 
python version you want to use by searching for the "Python" executable. 

I am not familiar with the pre-built versions of matplotlib. Perhaps they miss 
the 64bit intel versions? 
Perhaps you can find a lib (.so file) in matplotlib and use the "file" command 
to see the architectures, it was built for.
You should be able to install matplotlib also with `pip install matplotlib`. 
(if you have pip)

Samuel

On 26.12.2011, at 06:40, Alex Ter-Sarkissov wrote:

> hi everyone, I run python 2.7.2. in Eclipse (recently upgraded from 2.6). I 
> have a problem with installing matplotlib (I found the version for python 
> 2.7. MacOs 10.3, no later versions). If I run python in terminal using arch 
> -i386 python, and then 
> 
> from matplotlib.pylab import *
> 
> and similar stuff, everything works fine. If I run python in eclipse or just 
> without arch -i386, I can import matplotlib as 
> 
> from matplotlib import  *
> 
> but actually nothing gets imported. If I do it in the same way as above, I 
> get the message
> 
> no matching architecture in universal wrapper
> 
> which means there's conflict of versions or something like that. I tried 
> reinstalling the interpreter and adding matplotlib to forced built-ins, but 
> nothing helped. For some reason I didn't have this problem with numpy and 
> tkinter. 
> 
> Any suggestions are appreciated. 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat
Hi,

Oddly, but numpy 1.6 seems to behave more consistent manner:

In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

In []: d= np.load('data.npy')
In []: d.dtype
Out[]: dtype('float32')

In []: d.mean()
Out[]: 3045.74718
In []: d.mean(dtype= np.float32)
Out[]: 3045.74718
In []: d.mean(dtype= np.float64)
Out[]: 3045.747251076416
In []: (d- d.min()).mean()+ d.min()
Out[]: 3045.7472508750002
In []: d.mean(axis= 0).mean()
Out[]: 3045.74724
In []: d.mean(axis= 1).mean()
Out[]: 3045.74724

Or does the results of calculations depend more on the platform?


My 2 cents,
eat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Kathleen M Tacina
I found something similar, with a very simple example.

On 64-bit linux, python 2.7.2, numpy development version:

In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)

In [23]: a.mean()
Out[23]: 4034.16357421875

In [24]: np.version.full_version
Out[24]: '2.0.0.dev-55472ca'


But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
>>>a = np.ones((1024,1024),dtype=np.float32)
>>>a.mean()
4000.0
>>>np.version.full_version
'1.6.1'


On Tue, 2012-01-24 at 17:12 -0600, eat wrote:

> Hi,
> 
> 
> 
> Oddly, but numpy 1.6 seems to behave more consistent manner:
> 
> 
> In []: sys.version
> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
> (Intel)]'
> In []: np.version.version
> Out[]: '1.6.0'
> 
> 
> In []: d= np.load('data.npy')
> In []: d.dtype
> Out[]: dtype('float32')
> 
> 
> In []: d.mean()
> Out[]: 3045.74718
> In []: d.mean(dtype= np.float32)
> Out[]: 3045.74718
> In []: d.mean(dtype= np.float64)
> Out[]: 3045.747251076416
> In []: (d- d.min()).mean()+ d.min()
> Out[]: 3045.7472508750002
> In []: d.mean(axis= 0).mean()
> Out[]: 3045.74724
> In []: d.mean(axis= 1).mean()
> Out[]: 3045.74724
> 
> 
> Or does the results of calculations depend more on the platform?
> 
> 
> 
> 
> My 2 cents,
> eat

-- 
--
Kathleen M. Tacina
NASA Glenn Research Center
MS 5-10
21000 Brookpark Road
Cleveland, OH 44135
Telephone: (216) 433-6660
Fax: (216) 433-5802
--
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread David Warde-Farley
On Wed, Jan 25, 2012 at 01:12:06AM +0200, eat wrote:

> Or does the results of calculations depend more on the platform?

Floating point operations often do, sadly (not saying that this is the case
here, but you'd need to try both versions on the same machine [or at least
architecture/bit-width]/same platform to be certain).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] einsum evaluation order

2012-01-24 Thread Mark Wiebe
On Tue, Jan 24, 2012 at 6:32 AM, Søren Gammelmark wrote:

> Dear all,
>
> I was just looking into numpy.einsum and encountered an issue which might
> be worth pointing out in the documentation.
>
> Let us say you wish to evaluate something like this (repeated indices a
> summed)
>
> D[alpha, alphaprime] = A[alpha, beta, sigma] * B[alphaprime, betaprime,
> sigma] * C[beta, betaprime]
>
> with einsum as
>
> einsum('abs,cds,bd->ac', A, B, C)
>
> then it is not exactly clear which order einsum evaluates the contractions
> (or if it does it in one go). This can be very important since you can do
> it in several ways, one of which has the least computational complexity.
> The most efficient way of doing it is to contract e.g. A and C and then
> contract that with B (or exchange A and B). A quick test on my labtop says
> 2.6 s with einsum and 0.13 s for two tensordots with A and B being D x D x
> 2 and C is D x D for D = 96. This scaling seems to explode for higher
> dimensions, whereas it is much better with the two independent contractions
> (i believe it should be O(D^3)).For D = 512 I could do it in 5 s with two
> contractions, whereas I stopped waiting after 60 s for einsum (i guess
> einsum probably is O(D^4) in this case).
>

You are correct, einsum presently uses the most naive evaluation.


> I had in fact thought of making a function similar to einsum for a while,
> but after I noticed it dropped it. I think, however, that there might still
> be room for a tool for evaluating more complicated expressions efficiently.
> I think the best way would be for the user to enter an expression like the
> one above which is then evaluated in the optimal order. I know how to do
> this (theoretically) if all the repeated indices only occur twice (like the
> expression above), but for the more general expression supported by einsum
> I om not sure how to do it (haven't thought about it). Here I am thinking
> about stuff like x[i] = a[i] * b[i] and their more general counterparts (at
> first glance this seems to be a simpler problem than full contractions). Do
> you think there is a need/interest for this kind of thing? In that case I
> would like the write it / help write it. Much of it, I think, can be
> reduced to decomposing the expression into existing numpy operations (e.g.
> tensordot). How to incorporate issues of storage layout etc, however, I
> have no idea.
>

I think a good approach would be to modify einsum so it decomposes the
expression into multiple products. It may even just be a simple dynamic
programming problem, but I haven't given it much thought.

In any case I think it might be nice to write explicitly how the expression
> in einsum is evaluated in the docs.
>

That's a good idea, yes.

Thanks,
Mark


>
> Søren Gammelmark
> PhD-student
> Department of Physics and Astronomy
> Aarhus University
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread questions anon
I need some help understanding how to loop through many arrays to calculate
the 95th percentile.
I can easily do this by using numpy.concatenate to make one big array and
then finding the 95th percentile using numpy.percentile but this causes a
memory error when I want to run this on 100's of netcdf files (see code
below).
Any alternative methods will be greatly appreciated.


all_TSFC=[]
for (path, dirs, files) in os.walk(MainFolder):
for dir in dirs:
print dir
path=path+'/'
for ncfile in files:
if ncfile[-3:]=='.nc':
print "dealing with ncfiles:", ncfile
ncfile=os.path.join(path,ncfile)
ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
TSFC=ncfile.variables['T_SFC'][:]
ncfile.close()
all_TSFC.append(TSFC)

big_array=N.ma.concatenate(all_TSFC)
Percentile95th=N.percentile(big_array, 95, axis=0)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat
Hi

On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina <
kathleen.m.tac...@nasa.gov> wrote:

> **
> I found something similar, with a very simple example.
>
> On 64-bit linux, python 2.7.2, numpy development version:
>
> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>
> In [23]: a.mean()
> Out[23]: 4034.16357421875
>
> In [24]: np.version.full_version
> Out[24]: '2.0.0.dev-55472ca'
>
>
> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
> >>>a = np.ones((1024,1024),dtype=np.float32)
> >>>a.mean()
> 4000.0
> >>>np.version.full_version
> '1.6.1'
>
This indeed looks very nasty, regardless of whether it is a version or
platform related problem.

-eat

>
>
>
> On Tue, 2012-01-24 at 17:12 -0600, eat wrote:
>
> Hi,
>
>
>
>  Oddly, but numpy 1.6 seems to behave more consistent manner:
>
>
>
>  In []: sys.version
>
>  Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
> (Intel)]'
>
>  In []: np.version.version
>
>  Out[]: '1.6.0'
>
>
>
>  In []: d= np.load('data.npy')
>
>  In []: d.dtype
>
>  Out[]: dtype('float32')
>
>
>
>  In []: d.mean()
>
>  Out[]: 3045.74718
>
>  In []: d.mean(dtype= np.float32)
>
>  Out[]: 3045.74718
>
>  In []: d.mean(dtype= np.float64)
>
>  Out[]: 3045.747251076416
>
>  In []: (d- d.min()).mean()+ d.min()
>
>  Out[]: 3045.7472508750002
>
>  In []: d.mean(axis= 0).mean()
>
>  Out[]: 3045.74724
>
>  In []: d.mean(axis= 1).mean()
>
>  Out[]: 3045.74724
>
>
>
>  Or does the results of calculations depend more on the platform?
>
>
>
>
>
>  My 2 cents,
>
>  eat
>
>   --
> --
> Kathleen M. Tacina
> NASA Glenn Research Center
> MS 5-10
> 21000 Brookpark Road
> Cleveland, OH 44135
> Telephone: (216) 433-6660
> Fax: (216) 433-5802
> --
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-24 Thread Mark Wiebe
2012/1/21 Ondřej Čertík 

> 
>
> Let me know if you figure out something. I think the "mask" thing is
> quite slow, but the problem is that it needs to be there, to catch
> overflows (and it is there in Fortran as well, see the
> "where" statement, which does the same thing). Maybe there is some
> other way to write the same thing in NumPy?
>

In the current master, you can replace

z[mask] *= z[mask]
z[mask] += c[mask]
with
np.multiply(z, z, out=z, where=mask)
np.add(z, c, out=z, where=mask)

The performance of this alternate syntax is still not great, but it is
significantly faster than what it replaces. For a particular choice of
mask, I get

In [40]: timeit z[mask] *= z[mask]

10 loops, best of 3: 29.1 ms per loop

In [41]: timeit np.multiply(z, z, out=z, where=mask)

100 loops, best of 3: 4.2 ms per loop


-Mark


> Ondrej
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Marc Shivers
This is probably not the best way to do it, but I think it would work:

Your could take two passes through your data, first calculating and storing
the median for each file and the number of elements in each file.  From
those data, you can get a lower bound on the 95th percentile of the
combined dataset.  For example, if all the files are the same size, and
you've got 100 of them, then the 95th percentile of the full dataset would
be at least as large as the 90th percentile of the individual file median
values.  Once you've got that cut-off value, go back through your files and
just pull out the values larger than your cut-off value.  Then you'd just
need to figure out what percentile in this subset would correspond to the
95th percentile in the full dataset.

HTH,
Marc



On Tue, Jan 24, 2012 at 7:22 PM, questions anon wrote:

> I need some help understanding how to loop through many arrays to
> calculate the 95th percentile.
> I can easily do this by using numpy.concatenate to make one big array and
> then finding the 95th percentile using numpy.percentile but this causes a
> memory error when I want to run this on 100's of netcdf files (see code
> below).
> Any alternative methods will be greatly appreciated.
>
>
> all_TSFC=[]
> for (path, dirs, files) in os.walk(MainFolder):
> for dir in dirs:
> print dir
> path=path+'/'
> for ncfile in files:
> if ncfile[-3:]=='.nc':
> print "dealing with ncfiles:", ncfile
> ncfile=os.path.join(path,ncfile)
> ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
> TSFC=ncfile.variables['T_SFC'][:]
> ncfile.close()
> all_TSFC.append(TSFC)
>
> big_array=N.ma.concatenate(all_TSFC)
> Percentile95th=N.percentile(big_array, 95, axis=0)
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unexpected behavior with np.min_scalar_type

2012-01-24 Thread Mark Wiebe
On Tue, Jan 24, 2012 at 7:29 AM, Kathleen M Tacina <
kathleen.m.tac...@nasa.gov> wrote:

> **
> I was experimenting with np.min_scalar_type to make sure it worked as
> expected, and found some unexpected results for integers between 2**63 and
> 2**64-1.  I would have expected np.min_scalar_type(2**64-1) to return
> uint64.  Instead, I get object.  Further experimenting showed that the
> largest integer for which np.min_scalar_type will return uint64 is
> 2**63-1.  Is this expected behavior?
>

This is a bug in how numpy detects the dtype of python objects.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/common.c#L18

You can see there it's only checking for a signed long long, not accounting
for the unsigned case. I created a ticket for you here:

http://projects.scipy.org/numpy/ticket/2028

-Mark


>
> On python 2.7.2 on a 64-bit linux machine:
> >>> import numpy as np
> >>> np.version.full_version
> '2.0.0.dev-55472ca'
> >>> np.min_scalar_type(2**8-1)
> dtype('uint8')
> >>> np.min_scalar_type(2**16-1)
> dtype('uint16')
> >>> np.min_scalar_type(2**32-1)
> dtype('uint32')
> >>> np.min_scalar_type(2**64-1)
> dtype('O')
> >>> np.min_scalar_type(2**63-1)
> dtype('uint64')
> >>> np.min_scalar_type(2**63)
> dtype('O')
>
> I get the same results on a Windows XP  machine running python 2.7.2 and
> numpy 1.6.1.
>
> Kathy
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fix for ticket #1973

2012-01-24 Thread Mark Wiebe
On Mon, Jan 16, 2012 at 8:14 AM, Charles R Harris  wrote:

>
>
> On Mon, Jan 16, 2012 at 8:52 AM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey wrote:
>>
>>> **
>>> On 01/14/2012 04:31 PM, Charles R Harris wrote:
>>>
>>> I've put up a pull request for a fix to ticket #1973. Currently the fix
>>> simply propagates the maskna flag when the *.astype method is called. A
>>> more complicated option would be to add a maskna keyword to specify whether
>>> the output is masked or not or propagates the type of the source, but that
>>> seems overly complex to me.
>>>
>>> Thoughts?
>>>
>>> Chuck
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing 
>>> listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>  Thanks for the correction and as well as the fix. While it worked for
>>> integer and floats (not complex ones), I got an error when using complex
>>> dtypes. This error that is also present in array creation of complex
>>> dtypes. Is this known or a new bug?
>>>
>>> If it is new, then we need to identify what functionality should handle
>>> np.NA but are not working.
>>>
>>> Bruce
>>>
>>> $ python
>>> Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
>>> [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> >>> import numpy as np
>>> >>> np.__version__ # pull request version
>>> '2.0.0.dev-88f9276'
>>> >>> np.array([1,2], dtype=np.complex)
>>> array([ 1.+0.j,  2.+0.j])
>>> >>> np.array([1,2, np.NA], dtype=np.complex)
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line
>>> 1445, in array_repr
>>> ', ', "array(")
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 459, in array2string
>>> separator, prefix, formatter=formatter)
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 263, in _array2string
>>> suppress_small),
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 724, in __init__
>>> self.real_format = FloatFormat(x.real, precision, suppress_small)
>>> ValueError: Cannot construct a view of data together with the
>>> NPY_ARRAY_MASKNA flag, the NA mask must be added later
>>> >>> ca=np.array([1,2], dtype=np.complex, maskna=True)
>>> >>> ca[1]=np.NA
>>> >>> ca
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line
>>> 1445, in array_repr
>>> ', ', "array(")
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 459, in array2string
>>> separator, prefix, formatter=formatter)
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 263, in _array2string
>>> suppress_small),
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 724, in __init__
>>> self.real_format = FloatFormat(x.real, precision, suppress_small)
>>> ValueError: Cannot construct a view of data together with the
>>> NPY_ARRAY_MASKNA flag, the NA mask must be added later
>>> >>>
>>>
>>>
>> Looks like a different bug involving the *.real and *.imag views. I'll
>> take a look.
>>
>>
> Looks like views of masked arrays have other problems:
>
> In [13]: a = ones(3, int16, maskna=1)
>
> In [14]: a.view(int8)
> Out[14]: array([1, 0, 1, NA, 1, NA], dtype=int8)
>
>
This looks like a serious bug to me, to avoid memory corruption issues it
should raise an exception.

-Mark


>
> I'm not sure what the policy should be here. One could construct a new
> mask adapted to the view, raise an error when the types don't align (I
> think the real/imag parts should be considered aligned), or just let the
> view unmask the array. The last seems dangerous. Hmm...
>



> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Brett Olsen
On Tue, Jan 24, 2012 at 6:22 PM, questions anon
 wrote:
> I need some help understanding how to loop through many arrays to calculate
> the 95th percentile.
> I can easily do this by using numpy.concatenate to make one big array and
> then finding the 95th percentile using numpy.percentile but this causes a
> memory error when I want to run this on 100's of netcdf files (see code
> below).
> Any alternative methods will be greatly appreciated.
>
>
> all_TSFC=[]
> for (path, dirs, files) in os.walk(MainFolder):
>     for dir in dirs:
>     print dir
>     path=path+'/'
>     for ncfile in files:
>     if ncfile[-3:]=='.nc':
>     print "dealing with ncfiles:", ncfile
>     ncfile=os.path.join(path,ncfile)
>     ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
>     TSFC=ncfile.variables['T_SFC'][:]
>     ncfile.close()
>     all_TSFC.append(TSFC)
>
> big_array=N.ma.concatenate(all_TSFC)
> Percentile95th=N.percentile(big_array, 95, axis=0)

If the range of your data is known and limited (i.e., you have a
comparatively small number of possible values, but a number of repeats
of each value) then you could do this by keeping a running cumulative
distribution function as you go through each of your files.  For each
file, calculate a cumulative distribution function --- at each
possible value, record the fraction of that population strictly less
than that value --- and then it's straightforward to combine the
cumulative distribution functions from two separate files:
cumdist_both = (cumdist1 * N1 + cumdist2 * N2) / (N1 + N2)

Then once you've gone through all the files, look for the value where
your cumulative distribution function is equal to 0.95.  If your data
isn't structured with repeated values, though, this won't work,
because your cumulative distribution function will become too big to
hold into memory.  In that case, what I would probably do would be an
iterative approach:  make an approximation to the exact function by
removing some fraction of the possible values, which will provide a
limited range for the exact percentile you want, and then walk through
the files again calculating the function more exactly within the
limited range, repeating until you have the value to the desired
precision.

~Brett
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread questions anon
thanks for your responses,
because of the size of the dataset I will still end up with the memory
error if I calculate the median for each file, additionally the files are
not all the same size. I believe this memory problem will still arise with
the cumulative distribution calculation and not sure I understand how to
write the second suggestion about the iterative approach but will have a go.
Thanks again

On Wed, Jan 25, 2012 at 1:26 PM, Brett Olsen  wrote:

> On Tue, Jan 24, 2012 at 6:22 PM, questions anon
>  wrote:
> > I need some help understanding how to loop through many arrays to
> calculate
> > the 95th percentile.
> > I can easily do this by using numpy.concatenate to make one big array and
> > then finding the 95th percentile using numpy.percentile but this causes a
> > memory error when I want to run this on 100's of netcdf files (see code
> > below).
> > Any alternative methods will be greatly appreciated.
> >
> >
> > all_TSFC=[]
> > for (path, dirs, files) in os.walk(MainFolder):
> > for dir in dirs:
> > print dir
> > path=path+'/'
> > for ncfile in files:
> > if ncfile[-3:]=='.nc':
> > print "dealing with ncfiles:", ncfile
> > ncfile=os.path.join(path,ncfile)
> > ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
> > TSFC=ncfile.variables['T_SFC'][:]
> > ncfile.close()
> > all_TSFC.append(TSFC)
> >
> > big_array=N.ma.concatenate(all_TSFC)
> > Percentile95th=N.percentile(big_array, 95, axis=0)
>
> If the range of your data is known and limited (i.e., you have a
> comparatively small number of possible values, but a number of repeats
> of each value) then you could do this by keeping a running cumulative
> distribution function as you go through each of your files.  For each
> file, calculate a cumulative distribution function --- at each
> possible value, record the fraction of that population strictly less
> than that value --- and then it's straightforward to combine the
> cumulative distribution functions from two separate files:
> cumdist_both = (cumdist1 * N1 + cumdist2 * N2) / (N1 + N2)
>
> Then once you've gone through all the files, look for the value where
> your cumulative distribution function is equal to 0.95.  If your data
> isn't structured with repeated values, though, this won't work,
> because your cumulative distribution function will become too big to
> hold into memory.  In that case, what I would probably do would be an
> iterative approach:  make an approximation to the exact function by
> removing some fraction of the possible values, which will provide a
> limited range for the exact percentile you want, and then walk through
> the files again calculating the function more exactly within the
> limited range, repeating until you have the value to the desired
> precision.
>
> ~Brett
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Olivier Delalleau
Note that if you are ok with an approximate solution, and you can assume
your data is somewhat shuffled, a simple online algorithm that uses no
memory consists in:
- choosing a small step size delta
- initializing your percentile p to a more or less random value (a
meaningful guess is better though)
- iterate through your samples, updating p after each sample by p += 19 *
delta if sample > p, and p -= delta otherwise

The idea is that the 95th percentile is such that 5% of the data is higher,
and 95% (19 times more) is lower, so if p is equal to this value, on
average it should remain constant through the online update.
You may do multiple passes if you are not confident in your initial value,
possibly reducing delta over time to improve accuracy.

-=- Olivier

2012/1/24 questions anon 

> thanks for your responses,
> because of the size of the dataset I will still end up with the memory
> error if I calculate the median for each file, additionally the files are
> not all the same size. I believe this memory problem will still arise with
> the cumulative distribution calculation and not sure I understand how to
> write the second suggestion about the iterative approach but will have a go.
> Thanks again
>
>
> On Wed, Jan 25, 2012 at 1:26 PM, Brett Olsen wrote:
>
>> On Tue, Jan 24, 2012 at 6:22 PM, questions anon
>>  wrote:
>> > I need some help understanding how to loop through many arrays to
>> calculate
>> > the 95th percentile.
>> > I can easily do this by using numpy.concatenate to make one big array
>> and
>> > then finding the 95th percentile using numpy.percentile but this causes
>> a
>> > memory error when I want to run this on 100's of netcdf files (see code
>> > below).
>> > Any alternative methods will be greatly appreciated.
>> >
>> >
>> > all_TSFC=[]
>> > for (path, dirs, files) in os.walk(MainFolder):
>> > for dir in dirs:
>> > print dir
>> > path=path+'/'
>> > for ncfile in files:
>> > if ncfile[-3:]=='.nc':
>> > print "dealing with ncfiles:", ncfile
>> > ncfile=os.path.join(path,ncfile)
>> > ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
>> > TSFC=ncfile.variables['T_SFC'][:]
>> > ncfile.close()
>> > all_TSFC.append(TSFC)
>> >
>> > big_array=N.ma.concatenate(all_TSFC)
>> > Percentile95th=N.percentile(big_array, 95, axis=0)
>>
>> If the range of your data is known and limited (i.e., you have a
>> comparatively small number of possible values, but a number of repeats
>> of each value) then you could do this by keeping a running cumulative
>> distribution function as you go through each of your files.  For each
>> file, calculate a cumulative distribution function --- at each
>> possible value, record the fraction of that population strictly less
>> than that value --- and then it's straightforward to combine the
>> cumulative distribution functions from two separate files:
>> cumdist_both = (cumdist1 * N1 + cumdist2 * N2) / (N1 + N2)
>>
>> Then once you've gone through all the files, look for the value where
>> your cumulative distribution function is equal to 0.95.  If your data
>> isn't structured with repeated values, though, this won't work,
>> because your cumulative distribution function will become too big to
>> hold into memory.  In that case, what I would probably do would be an
>> iterative approach:  make an approximation to the exact function by
>> removing some fraction of the possible values, which will provide a
>> limited range for the exact percentile you want, and then walk through
>> the files again calculating the function more exactly within the
>> limited range, repeating until you have the value to the desired
>> precision.
>>
>> ~Brett
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread josef . pktd
On Tue, Jan 24, 2012 at 7:21 PM, eat  wrote:

> Hi
>
> On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina <
> kathleen.m.tac...@nasa.gov> wrote:
>
>> **
>> I found something similar, with a very simple example.
>>
>> On 64-bit linux, python 2.7.2, numpy development version:
>>
>> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>>
>> In [23]: a.mean()
>> Out[23]: 4034.16357421875
>>
>> In [24]: np.version.full_version
>> Out[24]: '2.0.0.dev-55472ca'
>>
>>
>> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
>> >>>a = np.ones((1024,1024),dtype=np.float32)
>> >>>a.mean()
>> 4000.0
>> >>>np.version.full_version
>> '1.6.1'
>>
> This indeed looks very nasty, regardless of whether it is a version or
> platform related problem.
>

Looks like platform specific, same result as -eat

Windows 7,
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32

>>> a = np.ones((1024,1024),dtype=np.float32)
>>> a.mean()
1.0

>>> (4000*a).dtype
dtype('float32')
>>> (4000*a).mean()
4000.0

>>> b = np.load("data.npy")
>>> b.mean()
3045.74718
>>> b.shape
(1000, 1000)
>>> b.mean(0).mean(0)
3045.74724
>>> _.dtype
dtype('float64')
>>> b.dtype
dtype('float32')

>>> b.mean(dtype=np.float32)
3045.74718

Josef


>
> -eat
>
>>
>>
>>
>> On Tue, 2012-01-24 at 17:12 -0600, eat wrote:
>>
>> Hi,
>>
>>
>>
>>  Oddly, but numpy 1.6 seems to behave more consistent manner:
>>
>>
>>
>>  In []: sys.version
>>
>>  Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
>> (Intel)]'
>>
>>  In []: np.version.version
>>
>>  Out[]: '1.6.0'
>>
>>
>>
>>  In []: d= np.load('data.npy')
>>
>>  In []: d.dtype
>>
>>  Out[]: dtype('float32')
>>
>>
>>
>>  In []: d.mean()
>>
>>  Out[]: 3045.74718
>>
>>  In []: d.mean(dtype= np.float32)
>>
>>  Out[]: 3045.74718
>>
>>  In []: d.mean(dtype= np.float64)
>>
>>  Out[]: 3045.747251076416
>>
>>  In []: (d- d.min()).mean()+ d.min()
>>
>>  Out[]: 3045.7472508750002
>>
>>  In []: d.mean(axis= 0).mean()
>>
>>  Out[]: 3045.74724
>>
>>  In []: d.mean(axis= 1).mean()
>>
>>  Out[]: 3045.74724
>>
>>
>>
>>  Or does the results of calculations depend more on the platform?
>>
>>
>>
>>
>>
>>  My 2 cents,
>>
>>  eat
>>
>>   --
>> --
>> Kathleen M. Tacina
>> NASA Glenn Research Center
>> MS 5-10
>> 21000 Brookpark Road
>> Cleveland, OH 44135
>> Telephone: (216) 433-6660
>> Fax: (216) 433-5802
>> --
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Charles R Harris
On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina <
kathleen.m.tac...@nasa.gov> wrote:

> **
> I found something similar, with a very simple example.
>
> On 64-bit linux, python 2.7.2, numpy development version:
>
> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>
> In [23]: a.mean()
> Out[23]: 4034.16357421875
>
> In [24]: np.version.full_version
> Out[24]: '2.0.0.dev-55472ca'
>
>
> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
> >>>a = np.ones((1024,1024),dtype=np.float32)
> >>>a.mean()
> 4000.0
> >>>np.version.full_version
> '1.6.1'
>
>
>
Yes, the results are platform/compiler dependent. The 32 bit platforms tend
to use extended precision accumulators and the x87 instruction set. The 64
bit platforms tend to use sse2+. Different precisions, even though you
might think they are the same.



Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread josef . pktd
On Wed, Jan 25, 2012 at 12:03 AM, Charles R Harris
 wrote:
>
>
> On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina
>  wrote:
>>
>> I found something similar, with a very simple example.
>>
>> On 64-bit linux, python 2.7.2, numpy development version:
>>
>> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>>
>> In [23]: a.mean()
>> Out[23]: 4034.16357421875
>>
>> In [24]: np.version.full_version
>> Out[24]: '2.0.0.dev-55472ca'
>>
>>
>> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
>> >>>a = np.ones((1024,1024),dtype=np.float32)
>> >>>a.mean()
>> 4000.0
>> >>>np.version.full_version
>> '1.6.1'
>>
>>
>
> Yes, the results are platform/compiler dependent. The 32 bit platforms tend
> to use extended precision accumulators and the x87 instruction set. The 64
> bit platforms tend to use sse2+. Different precisions, even though you might
> think they are the same.

just to confirm, same computer as before but the python 3.2 version is
64 bit, now I get the "Linux" result

Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit
(AMD64)] on win32

>>> import numpy as np
>>> np.__version__
'1.5.1'
>>> a = 4000*np.ones((1024,1024),dtype=np.float32)
>>> a.mean()
4034.16357421875
>>> a.mean(0).mean(0)
4000.0
>>> a.mean(dtype=np.float64)
4000.0

Josef

>
> 
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion