Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-26 Thread Sturla Molden
Den 24.01.2012 17:19, skrev David Warde-Farley:

 Hmm. Seeing as the width of a C long is inconsistent, does this imply that
 the random number generator will produce different results on different
 platforms?

If it does, it is a C programming mistake. C code should never depend on 
the exact size of a long, only it's minimum size.  ISO C defines other 
datatypes if an exact integer size is needed (include stdint.h), but 
ANSI C used for NumPy does not.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-25 Thread Sturla Molden
On 24.01.2012 23:30, David Warde-Farley wrote:

 I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap
 is using an int for a counter variable where it should be using an npy_intp.

 I've filed a pull request at https://github.com/numpy/numpy/pull/188 with a
 regression test.

That is great :)

Now we just need to fix mtrand.pyx and all this will be gone.



Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-25 Thread Charles R Harris
On Tue, Jan 24, 2012 at 3:30 PM, David Warde-Farley 
warde...@iro.umontreal.ca wrote:

 On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote:
  On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote:
 
   Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.
 
  Alright, so that rules out platform specific effects.
 
  I'll try and hunt the bug down when I have some time, if someone more
  familiar with the indexing code doesn't beat me to it.

 I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap
 is using an int for a counter variable where it should be using an
 npy_intp.

 I've filed a pull request at https://github.com/numpy/numpy/pull/188 with
 a
 regression test.


I think this bug, or one like it, was reported a couple of years ago. But I
don't recall if there was ever a ticket opened.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 06:32, Sturla Molden wrote:

 The use of C long affects all the C and Pyrex source code in mtrand
 module, not just mtrand.pyx. All of it is fubar on Win64.

randomkit.c handles C long correctly, I think. There are different codes 
for 32 and 64 bit C long, and buffer sizes are size_t.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 09:21, Sturla Molden wrote:

 randomkit.c handles C long correctly, I think. There are different codes
 for 32 and 64 bit C long, and buffer sizes are size_t.

distributions.c take C longs as parameters e.g. for the binomial 
distribution. mtrand.pyx correctly handles this, but it can give an 
unexpected overflow error on 64-bit Windows:


In [1]: np.random.binomial(2**31, .5)
---
OverflowError Traceback (most recent call last)
C:\Windows\system32\ipython-input-1-000aa0626c42 in module()
 1 np.random.binomial(2**31, .5)

C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in 
mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)()

OverflowError: Python int too large to convert to C long


On systems where C longs are 64 bit, this is likely not to produce an 
error.

This begs the question if also randomkit.c and districutions.c should be 
changed to use npy_intp for consistency across all platforms.

(I assume we are not supporting 16 bit NumPy, in which case we will need 
C long there...)


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 06:32, Sturla Molden wrote:
 Den 24.01.2012 06:00, skrev Sturla Molden:
 Both i and length could overflow here. It should overflow on
 allocation of more than 2 GB. There is also a lot of C longs in the
 internal state (line 55-105), as well as the other functions.

 The use of C long affects all the C and Pyrex source code in mtrand
 module, not just mtrand.pyx. All of it is fubar on Win64.


The coding is also inconsistent, compare for example:

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201



Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 08:37, Sturla Molden stu...@molden.no wrote:
 On 24.01.2012 09:21, Sturla Molden wrote:

 randomkit.c handles C long correctly, I think. There are different codes
 for 32 and 64 bit C long, and buffer sizes are size_t.

 distributions.c take C longs as parameters e.g. for the binomial
 distribution. mtrand.pyx correctly handles this, but it can give an
 unexpected overflow error on 64-bit Windows:


 In [1]: np.random.binomial(2**31, .5)
 ---
 OverflowError                             Traceback (most recent call last)
 C:\Windows\system32\ipython-input-1-000aa0626c42 in module()
  1 np.random.binomial(2**31, .5)

 C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in
 mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)()

 OverflowError: Python int too large to convert to C long


 On systems where C longs are 64 bit, this is likely not to produce an
 error.

 This begs the question if also randomkit.c and districutions.c should be
 changed to use npy_intp for consistency across all platforms.

There are two different uses of long that you need to distinguish. One
is for sizes, and one is for parameters and values. The sizes should
definitely be upgraded to npy_intp. The latter shouldn't; these should
remain as the default integer type of Python and numpy, a C long.

The reason longs are used for sizes is that I wrote mtrand for Numeric
and Python 2.4 before numpy was even announced (and I don't think we
had npy_intp at the time I merged it into numpy, but I could be
wrong). Using longs for sizes was the order of the day. I don't think
I had even touched a 64-bit machine that wasn't a DEC Alpha at the
time, so I knew very little about the issues.

So yes, please, fix whatever you can.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 08:47, Sturla Molden stu...@molden.no wrote:

 The coding is also inconsistent, compare for example:

 https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180

 https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201

I'm sorry, what are you demonstrating there?

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 10:16, Robert Kern wrote:

 I'm sorry, what are you demonstrating there?

Both npy_intp and C long are used for sizes and indexing.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 09:19, Sturla Molden stu...@molden.no wrote:
 On 24.01.2012 10:16, Robert Kern wrote:

 I'm sorry, what are you demonstrating there?

 Both npy_intp and C long are used for sizes and indexing.

Ah, yes. I think Travis added the multiiter code to cont1_array(),
which does broadcasting, so he used npy_intp as is proper (and
necessary to pass into the multiiter API). The other functions don't
do broadcasting, so he didn't touch them.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 10:15, Robert Kern wrote:

 There are two different uses of long that you need to distinguish. One
 is for sizes, and one is for parameters and values. The sizes should
 definitely be upgraded to npy_intp. The latter shouldn't; these should
 remain as the default integer type of Python and numpy, a C long.

Ok, that makes sence.

 The reason longs are used for sizes is that I wrote mtrand for Numeric
 and Python 2.4 before numpy was even announced (and I don't think we
 had npy_intp at the time I merged it into numpy, but I could be
 wrong). Using longs for sizes was the order of the day. I don't think
 I had even touched a 64-bit machine that wasn't a DEC Alpha at the
 time, so I knew very little about the issues.


On amd64 the native datatypes are actually a 64 bit pointer with a 32 
bit offset (contrary to what we see in Python and NumPy C sources), 
which is one reason why C longs are still 32 bits in MSVC. Thus an array 
size (size_t) should be 64 bits, but array indices (C long) should be 32 
bits. But nobody likes to code like that (e.g. we would need an extra 64 
bit pointer as cursor if the buffer size overflows a C long), and I 
don't think using a non-native 64-bit offset incur a lot of extra 
overhead for the CPU.

:-)

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 09:15:01AM +, Robert Kern wrote:
 On Tue, Jan 24, 2012 at 08:37, Sturla Molden stu...@molden.no wrote:
  On 24.01.2012 09:21, Sturla Molden wrote:
 
  randomkit.c handles C long correctly, I think. There are different codes
  for 32 and 64 bit C long, and buffer sizes are size_t.
 
  distributions.c take C longs as parameters e.g. for the binomial
  distribution. mtrand.pyx correctly handles this, but it can give an
  unexpected overflow error on 64-bit Windows:
 
 
  In [1]: np.random.binomial(2**31, .5)
  ---
  OverflowError                             Traceback (most recent call last)
  C:\Windows\system32\ipython-input-1-000aa0626c42 in module()
   1 np.random.binomial(2**31, .5)
 
  C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in
  mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)()
 
  OverflowError: Python int too large to convert to C long
 
 
  On systems where C longs are 64 bit, this is likely not to produce an
  error.
 
  This begs the question if also randomkit.c and districutions.c should be
  changed to use npy_intp for consistency across all platforms.
 
 There are two different uses of long that you need to distinguish. One
 is for sizes, and one is for parameters and values. The sizes should
 definitely be upgraded to npy_intp. The latter shouldn't; these should
 remain as the default integer type of Python and numpy, a C long.

Hmm. Seeing as the width of a C long is inconsistent, does this imply that
the random number generator will produce different results on different
platforms? Or do the state dynamics prevent it from ever growing in magnitude
to the point where that's an issue?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote:
 Den 23.01.2012 22:08, skrev Christoph Gohlke:
 
  Maybe this explains the win-amd64 behavior: There are a couple of places
  in mtrand where array indices and sizes are C long instead of npy_intp,
  for example in the randint function:
 
  https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863
 
 
 
 Both i and length could overflow here. It should overflow on allocation 
 of more than 2 GB.
 
 There is also a lot of C longs in the internal state (line 55-105), as 
 well as the other functions.
 
 Producing 2 GB of random ints twice fails:

Sturla, since you seem to have access to Win64 machines, do you suppose you
could try this code:

 a = numpy.ones((1, 972))
 b = numpy.zeros((4993210,), dtype=int)
 c = a[b]

and verify that there's a whole lot of 0s in the matrix, specifically,

 c[574519:].sum()
356.0
 c[574520:].sum()
0.0

is the case on Linux 64-bit; is it the case on Windows 64?

Thanks a lot,

David

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robin
On Tue, Jan 24, 2012 at 6:24 PM, David Warde-Farley
warde...@iro.umontreal.ca wrote:
 On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote:
 Den 23.01.2012 22:08, skrev Christoph Gohlke:
 
  Maybe this explains the win-amd64 behavior: There are a couple of places
  in mtrand where array indices and sizes are C long instead of npy_intp,
  for example in the randint function:
 
  https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863
 
 

 Both i and length could overflow here. It should overflow on allocation
 of more than 2 GB.

 There is also a lot of C longs in the internal state (line 55-105), as
 well as the other functions.

 Producing 2 GB of random ints twice fails:

 Sturla, since you seem to have access to Win64 machines, do you suppose you
 could try this code:

 a = numpy.ones((1, 972))
 b = numpy.zeros((4993210,), dtype=int)
 c = a[b]

 and verify that there's a whole lot of 0s in the matrix, specifically,

 c[574519:].sum()
 356.0
 c[574520:].sum()
 0.0

 is the case on Linux 64-bit; is it the case on Windows 64?

Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.

Cheers

Robin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote:

 Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.

Alright, so that rules out platform specific effects.

I'll try and hunt the bug down when I have some time, if someone more
familiar with the indexing code doesn't beat me to it.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote:
 On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote:
 
  Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.
 
 Alright, so that rules out platform specific effects.
 
 I'll try and hunt the bug down when I have some time, if someone more
 familiar with the indexing code doesn't beat me to it.

I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap
is using an int for a counter variable where it should be using an npy_intp.

I've filed a pull request at https://github.com/numpy/numpy/pull/188 with a
regression test.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Samuel John

On 23.01.2012, at 11:23, David Warde-Farley wrote:
 a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8')
 b = numpy.random.randint(500,size=(4993210,))
 c = a[b]
 In [14]: c[100:].sum()
 Out[14]: 0

Same here.

Python 2.7.2, 64bit, Mac OS X (Lion), 8GB RAM, numpy.__version__ = 
2.0.0.dev-55472ca
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)]
Numpy built without llvm.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread David Warde-Farley
I've reproduced this (rather serious) bug myself and confirmed that it exists
in master, and as far back as 1.4.1.

I'd really appreciate if someone could reproduce and confirm on another
machine, as so far all my testing has been on our single high-memory machine.

Thanks,
David

On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote:
 A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on 
 Linux (Fedora Core 14) 64-bit:
 
  a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8')
  b = numpy.random.randint(500,size=(4993210,))
  c = a[b]
 
 It seems c is not getting filled in full, namely:
 
  In [14]: c[100:].sum()
  Out[14]: 0
 
 I haven't been able to reproduce this quite yet, I'll try to find a machine 
 with sufficient memory tomorrow. But does anyone have any insight in the mean 
 time? It smells like some kind of integer overflow bug.
 
 Thanks,
 
 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Travis Oliphant
Can you determine where the problem is, precisely.In other words, can you 
verify that c is not getting filled in correctly? 

You are no doubt going to get overflow in the summation as you have a uint8 
parameter.   But, having that overflow be exactly '0' would be surprising.  

Can you verify that a and b are getting created correctly?   Also, 'c' should 
be a 2-d array, can you verify that?  Can you take the sum along the -1 axis 
and the 0 axis separately: 

print a.shape
print b.shape
print c.shape

c[100:].sum(axis=0)
d = c[100:].sum(axis=-1)
print d[:100]
print d[-100:]



On Jan 23, 2012, at 12:55 PM, David Warde-Farley wrote:

 I've reproduced this (rather serious) bug myself and confirmed that it exists
 in master, and as far back as 1.4.1.
 
 I'd really appreciate if someone could reproduce and confirm on another
 machine, as so far all my testing has been on our single high-memory machine.
 
 Thanks,
 David
 
 On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote:
 A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, 
 on Linux (Fedora Core 14) 64-bit:
 
 a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8')
 b = numpy.random.randint(500,size=(4993210,))
 c = a[b]
 
 It seems c is not getting filled in full, namely:
 
 In [14]: c[100:].sum()
 Out[14]: 0
 
 I haven't been able to reproduce this quite yet, I'll try to find a machine 
 with sufficient memory tomorrow. But does anyone have any insight in the 
 mean time? It smells like some kind of integer overflow bug.
 
 Thanks,
 
 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Robin
On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley
warde...@iro.umontreal.ca wrote:
 I've reproduced this (rather serious) bug myself and confirmed that it exists
 in master, and as far back as 1.4.1.

 I'd really appreciate if someone could reproduce and confirm on another
 machine, as so far all my testing has been on our single high-memory machine.

I see the same behaviour on a Winodows machine with numpy 1.6.1. But I
don't think it is an indexing problem - rather something with the
random number creation. a itself is already zeros for high indexes.

In [8]: b[100:110]
Out[8]:
array([3429029, 1251819, 4292918, 2249483,  757620, 3977130, 3455449,
   2005054, 2565207, 3114930])

In [9]: a[b[100:110]]
Out[9]:
array([[0, 0, 0, ..., 0, 0, 0],
   [0, 0, 0, ..., 0, 0, 0],
   [0, 0, 0, ..., 0, 0, 0],
   ...,
   [0, 0, 0, ..., 0, 0, 0],
   [0, 0, 0, ..., 0, 0, 0],
   [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

In [41]: a[581350:,0].sum()
Out[41]: 0

Cheers

Robin

 Thanks,
 David

 On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote:
 A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, 
 on Linux (Fedora Core 14) 64-bit:

  a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8')
  b = numpy.random.randint(500,size=(4993210,))
  c = a[b]

 It seems c is not getting filled in full, namely:

  In [14]: c[100:].sum()
  Out[14]: 0

 I haven't been able to reproduce this quite yet, I'll try to find a machine 
 with sufficient memory tomorrow. But does anyone have any insight in the 
 mean time? It smells like some kind of integer overflow bug.

 Thanks,

 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Aronne Merrelli
On Mon, Jan 23, 2012 at 1:33 PM, Travis Oliphant teoliph...@gmail.comwrote:

 Can you determine where the problem is, precisely.In other words, can
 you verify that c is not getting filled in correctly?

 You are no doubt going to get overflow in the summation as you have a
 uint8 parameter.   But, having that overflow be exactly '0' would be
 surprising.

 Can you verify that a and b are getting created correctly?   Also, 'c'
 should be a 2-d array, can you verify that?  Can you take the sum along the
 -1 axis and the 0 axis separately:

 print a.shape
 print b.shape
 print c.shape

 c[100:].sum(axis=0)
 d = c[100:].sum(axis=-1)
 print d[:100]
 print d[-100:]



I am getting the same results as David. It looks like c just stopped
filling in partway through the array. I don't think there is any overflow
issue, since the result of sum() is up-promoted to uint64 when I do that.
Travis, here are the outputs at my end - I cut out many zeros for brevity:

In [7]: print a.shape
(500, 972)
In [8]: print b.shape
(4993210,)
In [9]: print c.shape
(4993210, 972)

In [10]: c[100:].sum(axis=0)
Out[10]:
array([0, 0, 0,  , 0])

In [11]: d = c[100:].sum(axis=-1)

In [12]: print d[:100]
[0 0 0 ... 0 0]

In [13]: print d[-100:]
[0 0 0 ... 0 0 0]

I looked at sparse subsamples with matplotlib - specifically,
imshow(a[::1000, :]) - and the a array looks correct (random values
everywhere), but c is zero past a certain row number. In fact, it looks
like it becomes zero at row 575419 - I think for all rows in c beyond row
574519, the values will be zero. For lower row numbers, I think they are
correctly filled (at least, by the sparse view in matplotlib).

In [15]: a[b[574519], 350:360]
Out[15]: array([143, 155,  11,  30, 212, 149, 110, 164, 165, 120],
dtype=uint8)

In [16]: c[574519, 350:360]
Out[16]: array([143, 155,  11,  30, 212, 149,   0,   0,   0,   0],
dtype=uint8)


I'm using EPD 7.1, numpy 1.6.1, Linux installation (I don't know the kernel
details)

HTH,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread David Warde-Farley
Hi Travis,

Thanks for your reply.

On Mon, Jan 23, 2012 at 01:33:42PM -0600, Travis Oliphant wrote:
 Can you determine where the problem is, precisely.In other words, can you 
 verify that c is not getting filled in correctly? 
 
 You are no doubt going to get overflow in the summation as you have a uint8 
 parameter.   But, having that overflow be exactly '0' would be surprising.  

I've already looked at this actually. The last 440 or so rows of c are
all zero, however 'a' seems to be filled in fine:

 import numpy
 a = numpy.array(numpy.random.randint(256,size=(500,972)),
 dtype=numpy.uint8)
 b = numpy.random.randint(500,size=(4993210,))
 c = a[b]
 print c
[[186 215 204 ..., 170  98 198]
 [ 56  98 112 ...,  32 233   1]
 [ 44 133 171 ..., 163  35  51]
 ..., 
 [  0   0   0 ...,   0   0   0]
 [  0   0   0 ...,   0   0   0]
 [  0   0   0 ...,   0   0   0]]
 print a
[[ 30 182  56 ..., 133 162 173]
 [112 100  69 ...,   3 147  80]
 [124  70 232 ..., 114 177  11]
 ..., 
 [ 22  42  31 ..., 141 196 134]
 [ 74  47 167 ...,  38 193   9]
 [162 228 190 ..., 150  18   1]]

So it seems to have nothing to do with the sum, but rather the advanced
indexing operation. The zeros seem to start in the middle of row 574519,
in particular at element 356. This is reproducible with different random
vectors of indices, it seems.

So 558432824th element things go awry. I can't say it makes any sense to
me why this would be the magic number.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread David Warde-Farley
On Mon, Jan 23, 2012 at 08:38:44PM +0100, Robin wrote:
 On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley
 warde...@iro.umontreal.ca wrote:
  I've reproduced this (rather serious) bug myself and confirmed that it 
  exists
  in master, and as far back as 1.4.1.
 
  I'd really appreciate if someone could reproduce and confirm on another
  machine, as so far all my testing has been on our single high-memory 
  machine.
 
 I see the same behaviour on a Winodows machine with numpy 1.6.1. But I
 don't think it is an indexing problem - rather something with the
 random number creation. a itself is already zeros for high indexes.
 
 In [8]: b[100:110]
 Out[8]:
 array([3429029, 1251819, 4292918, 2249483,  757620, 3977130, 3455449,
2005054, 2565207, 3114930])
 
 In [9]: a[b[100:110]]
 Out[9]:
 array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
 
 In [41]: a[581350:,0].sum()
 Out[41]: 0

Hmm, this seems like a separate bug to mine. In mine, 'a' is indeed being
filled in -- the problem arises with c alone. 

So, another Windows-specific bug to add to the pile, perhaps? :(

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Christoph Gohlke


On 1/23/2012 12:33 PM, David Warde-Farley wrote:
 On Mon, Jan 23, 2012 at 08:38:44PM +0100, Robin wrote:
 On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley
 warde...@iro.umontreal.ca  wrote:
 I've reproduced this (rather serious) bug myself and confirmed that it 
 exists
 in master, and as far back as 1.4.1.

 I'd really appreciate if someone could reproduce and confirm on another
 machine, as so far all my testing has been on our single high-memory 
 machine.

 I see the same behaviour on a Winodows machine with numpy 1.6.1. But I
 don't think it is an indexing problem - rather something with the
 random number creation. a itself is already zeros for high indexes.
 
 In [8]: b[100:110]
 Out[8]:
 array([3429029, 1251819, 4292918, 2249483,  757620, 3977130, 3455449,
 2005054, 2565207, 3114930])

 In [9]: a[b[100:110]]
 Out[9]:
 array([[0, 0, 0, ..., 0, 0, 0],
 [0, 0, 0, ..., 0, 0, 0],
 [0, 0, 0, ..., 0, 0, 0],
 ...,
 [0, 0, 0, ..., 0, 0, 0],
 [0, 0, 0, ..., 0, 0, 0],
 [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

 In [41]: a[581350:,0].sum()
 Out[41]: 0

 Hmm, this seems like a separate bug to mine. In mine, 'a' is indeed being
 filled in -- the problem arises with c alone.

 So, another Windows-specific bug to add to the pile, perhaps? :(

 David


Maybe this explains the win-amd64 behavior: There are a couple of places 
in mtrand where array indices and sizes are C long instead of npy_intp, 
for example in the randint function:

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863

Christoph
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Sturla Molden
Den 23.01.2012 22:08, skrev Christoph Gohlke:
 Maybe this explains the win-amd64 behavior: There are a couple of places
 in mtrand where array indices and sizes are C long instead of npy_intp,
 for example in the randint function:

 https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863



AFAIK, on AMD64 a C long is 64 bit on Linux (gcc) and 32 bit on Windows 
(gcc and MSVC).

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Sturla Molden
Den 23.01.2012 22:08, skrev Christoph Gohlke:

 Maybe this explains the win-amd64 behavior: There are a couple of places
 in mtrand where array indices and sizes are C long instead of npy_intp,
 for example in the randint function:

 https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863



Both i and length could overflow here. It should overflow on allocation 
of more than 2 GB.

There is also a lot of C longs in the internal state (line 55-105), as 
well as the other functions.

Producing 2 GB of random ints twice fails:

  import numpy as np
  np.random.randint(500,size=(2*1024**3,))
array([0, 0, 0, ..., 0, 0, 0])
  np.random.randint(500,size=(2*1024**3,))

Traceback (most recent call last):
   File pyshell#3, line 1, in module
 np.random.randint(500,size=(2*1024**3,))
   File mtrand.pyx, line 881, in mtrand.RandomState.randint 
(numpy\random\mtrand\mtrand.c:6040)
MemoryError
 


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Sturla Molden
Den 24.01.2012 06:00, skrev Sturla Molden:
 Both i and length could overflow here. It should overflow on 
 allocation of more than 2 GB. There is also a lot of C longs in the 
 internal state (line 55-105), as well as the other functions.

The use of C long affects all the C and Pyrex source code in mtrand 
module, not just mtrand.pyx. All of it is fubar on Win64.

 From the C standard, a C long is only quarranteed to be at least 32 
bits wide.  Thus a C long can only be expected to index up to 2**31 - 
1, and it is not a Windows specific problem.

So it seems there are hundreds of places in the mtrand module where 
integers can overflow on 64-bit Python.

Also the crappy old Pyrex code should be updated to some more recent Cython.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion