Re: [Numpy-discussion] advanced indexing bug with huge arrays?
Den 24.01.2012 17:19, skrev David Warde-Farley: Hmm. Seeing as the width of a C long is inconsistent, does this imply that the random number generator will produce different results on different platforms? If it does, it is a C programming mistake. C code should never depend on the exact size of a long, only it's minimum size. ISO C defines other datatypes if an exact integer size is needed (include stdint.h), but ANSI C used for NumPy does not. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On 24.01.2012 23:30, David Warde-Farley wrote: I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap is using an int for a counter variable where it should be using an npy_intp. I've filed a pull request at https://github.com/numpy/numpy/pull/188 with a regression test. That is great :) Now we just need to fix mtrand.pyx and all this will be gone. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Tue, Jan 24, 2012 at 3:30 PM, David Warde-Farley warde...@iro.umontreal.ca wrote: On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote: On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote: Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. Alright, so that rules out platform specific effects. I'll try and hunt the bug down when I have some time, if someone more familiar with the indexing code doesn't beat me to it. I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap is using an int for a counter variable where it should be using an npy_intp. I've filed a pull request at https://github.com/numpy/numpy/pull/188 with a regression test. I think this bug, or one like it, was reported a couple of years ago. But I don't recall if there was ever a ticket opened. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On 24.01.2012 06:32, Sturla Molden wrote: The use of C long affects all the C and Pyrex source code in mtrand module, not just mtrand.pyx. All of it is fubar on Win64. randomkit.c handles C long correctly, I think. There are different codes for 32 and 64 bit C long, and buffer sizes are size_t. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On 24.01.2012 09:21, Sturla Molden wrote: randomkit.c handles C long correctly, I think. There are different codes for 32 and 64 bit C long, and buffer sizes are size_t. distributions.c take C longs as parameters e.g. for the binomial distribution. mtrand.pyx correctly handles this, but it can give an unexpected overflow error on 64-bit Windows: In [1]: np.random.binomial(2**31, .5) --- OverflowError Traceback (most recent call last) C:\Windows\system32\ipython-input-1-000aa0626c42 in module() 1 np.random.binomial(2**31, .5) C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)() OverflowError: Python int too large to convert to C long On systems where C longs are 64 bit, this is likely not to produce an error. This begs the question if also randomkit.c and districutions.c should be changed to use npy_intp for consistency across all platforms. (I assume we are not supporting 16 bit NumPy, in which case we will need C long there...) Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On 24.01.2012 06:32, Sturla Molden wrote: Den 24.01.2012 06:00, skrev Sturla Molden: Both i and length could overflow here. It should overflow on allocation of more than 2 GB. There is also a lot of C longs in the internal state (line 55-105), as well as the other functions. The use of C long affects all the C and Pyrex source code in mtrand module, not just mtrand.pyx. All of it is fubar on Win64. The coding is also inconsistent, compare for example: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180 https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201 Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Tue, Jan 24, 2012 at 08:37, Sturla Molden stu...@molden.no wrote: On 24.01.2012 09:21, Sturla Molden wrote: randomkit.c handles C long correctly, I think. There are different codes for 32 and 64 bit C long, and buffer sizes are size_t. distributions.c take C longs as parameters e.g. for the binomial distribution. mtrand.pyx correctly handles this, but it can give an unexpected overflow error on 64-bit Windows: In [1]: np.random.binomial(2**31, .5) --- OverflowError Traceback (most recent call last) C:\Windows\system32\ipython-input-1-000aa0626c42 in module() 1 np.random.binomial(2**31, .5) C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)() OverflowError: Python int too large to convert to C long On systems where C longs are 64 bit, this is likely not to produce an error. This begs the question if also randomkit.c and districutions.c should be changed to use npy_intp for consistency across all platforms. There are two different uses of long that you need to distinguish. One is for sizes, and one is for parameters and values. The sizes should definitely be upgraded to npy_intp. The latter shouldn't; these should remain as the default integer type of Python and numpy, a C long. The reason longs are used for sizes is that I wrote mtrand for Numeric and Python 2.4 before numpy was even announced (and I don't think we had npy_intp at the time I merged it into numpy, but I could be wrong). Using longs for sizes was the order of the day. I don't think I had even touched a 64-bit machine that wasn't a DEC Alpha at the time, so I knew very little about the issues. So yes, please, fix whatever you can. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Tue, Jan 24, 2012 at 08:47, Sturla Molden stu...@molden.no wrote: The coding is also inconsistent, compare for example: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180 https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201 I'm sorry, what are you demonstrating there? -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On 24.01.2012 10:16, Robert Kern wrote: I'm sorry, what are you demonstrating there? Both npy_intp and C long are used for sizes and indexing. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Tue, Jan 24, 2012 at 09:19, Sturla Molden stu...@molden.no wrote: On 24.01.2012 10:16, Robert Kern wrote: I'm sorry, what are you demonstrating there? Both npy_intp and C long are used for sizes and indexing. Ah, yes. I think Travis added the multiiter code to cont1_array(), which does broadcasting, so he used npy_intp as is proper (and necessary to pass into the multiiter API). The other functions don't do broadcasting, so he didn't touch them. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On 24.01.2012 10:15, Robert Kern wrote: There are two different uses of long that you need to distinguish. One is for sizes, and one is for parameters and values. The sizes should definitely be upgraded to npy_intp. The latter shouldn't; these should remain as the default integer type of Python and numpy, a C long. Ok, that makes sence. The reason longs are used for sizes is that I wrote mtrand for Numeric and Python 2.4 before numpy was even announced (and I don't think we had npy_intp at the time I merged it into numpy, but I could be wrong). Using longs for sizes was the order of the day. I don't think I had even touched a 64-bit machine that wasn't a DEC Alpha at the time, so I knew very little about the issues. On amd64 the native datatypes are actually a 64 bit pointer with a 32 bit offset (contrary to what we see in Python and NumPy C sources), which is one reason why C longs are still 32 bits in MSVC. Thus an array size (size_t) should be 64 bits, but array indices (C long) should be 32 bits. But nobody likes to code like that (e.g. we would need an extra 64 bit pointer as cursor if the buffer size overflows a C long), and I don't think using a non-native 64-bit offset incur a lot of extra overhead for the CPU. :-) Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Tue, Jan 24, 2012 at 09:15:01AM +, Robert Kern wrote: On Tue, Jan 24, 2012 at 08:37, Sturla Molden stu...@molden.no wrote: On 24.01.2012 09:21, Sturla Molden wrote: randomkit.c handles C long correctly, I think. There are different codes for 32 and 64 bit C long, and buffer sizes are size_t. distributions.c take C longs as parameters e.g. for the binomial distribution. mtrand.pyx correctly handles this, but it can give an unexpected overflow error on 64-bit Windows: In [1]: np.random.binomial(2**31, .5) --- OverflowError Traceback (most recent call last) C:\Windows\system32\ipython-input-1-000aa0626c42 in module() 1 np.random.binomial(2**31, .5) C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)() OverflowError: Python int too large to convert to C long On systems where C longs are 64 bit, this is likely not to produce an error. This begs the question if also randomkit.c and districutions.c should be changed to use npy_intp for consistency across all platforms. There are two different uses of long that you need to distinguish. One is for sizes, and one is for parameters and values. The sizes should definitely be upgraded to npy_intp. The latter shouldn't; these should remain as the default integer type of Python and numpy, a C long. Hmm. Seeing as the width of a C long is inconsistent, does this imply that the random number generator will produce different results on different platforms? Or do the state dynamics prevent it from ever growing in magnitude to the point where that's an issue? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote: Den 23.01.2012 22:08, skrev Christoph Gohlke: Maybe this explains the win-amd64 behavior: There are a couple of places in mtrand where array indices and sizes are C long instead of npy_intp, for example in the randint function: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863 Both i and length could overflow here. It should overflow on allocation of more than 2 GB. There is also a lot of C longs in the internal state (line 55-105), as well as the other functions. Producing 2 GB of random ints twice fails: Sturla, since you seem to have access to Win64 machines, do you suppose you could try this code: a = numpy.ones((1, 972)) b = numpy.zeros((4993210,), dtype=int) c = a[b] and verify that there's a whole lot of 0s in the matrix, specifically, c[574519:].sum() 356.0 c[574520:].sum() 0.0 is the case on Linux 64-bit; is it the case on Windows 64? Thanks a lot, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Tue, Jan 24, 2012 at 6:24 PM, David Warde-Farley warde...@iro.umontreal.ca wrote: On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote: Den 23.01.2012 22:08, skrev Christoph Gohlke: Maybe this explains the win-amd64 behavior: There are a couple of places in mtrand where array indices and sizes are C long instead of npy_intp, for example in the randint function: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863 Both i and length could overflow here. It should overflow on allocation of more than 2 GB. There is also a lot of C longs in the internal state (line 55-105), as well as the other functions. Producing 2 GB of random ints twice fails: Sturla, since you seem to have access to Win64 machines, do you suppose you could try this code: a = numpy.ones((1, 972)) b = numpy.zeros((4993210,), dtype=int) c = a[b] and verify that there's a whole lot of 0s in the matrix, specifically, c[574519:].sum() 356.0 c[574520:].sum() 0.0 is the case on Linux 64-bit; is it the case on Windows 64? Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. Cheers Robin ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote: Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. Alright, so that rules out platform specific effects. I'll try and hunt the bug down when I have some time, if someone more familiar with the indexing code doesn't beat me to it. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote: On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote: Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. Alright, so that rules out platform specific effects. I'll try and hunt the bug down when I have some time, if someone more familiar with the indexing code doesn't beat me to it. I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap is using an int for a counter variable where it should be using an npy_intp. I've filed a pull request at https://github.com/numpy/numpy/pull/188 with a regression test. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On 23.01.2012, at 11:23, David Warde-Farley wrote: a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8') b = numpy.random.randint(500,size=(4993210,)) c = a[b] In [14]: c[100:].sum() Out[14]: 0 Same here. Python 2.7.2, 64bit, Mac OS X (Lion), 8GB RAM, numpy.__version__ = 2.0.0.dev-55472ca [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)] Numpy built without llvm. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
I've reproduced this (rather serious) bug myself and confirmed that it exists in master, and as far back as 1.4.1. I'd really appreciate if someone could reproduce and confirm on another machine, as so far all my testing has been on our single high-memory machine. Thanks, David On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote: A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit: a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8') b = numpy.random.randint(500,size=(4993210,)) c = a[b] It seems c is not getting filled in full, namely: In [14]: c[100:].sum() Out[14]: 0 I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug. Thanks, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
Can you determine where the problem is, precisely.In other words, can you verify that c is not getting filled in correctly? You are no doubt going to get overflow in the summation as you have a uint8 parameter. But, having that overflow be exactly '0' would be surprising. Can you verify that a and b are getting created correctly? Also, 'c' should be a 2-d array, can you verify that? Can you take the sum along the -1 axis and the 0 axis separately: print a.shape print b.shape print c.shape c[100:].sum(axis=0) d = c[100:].sum(axis=-1) print d[:100] print d[-100:] On Jan 23, 2012, at 12:55 PM, David Warde-Farley wrote: I've reproduced this (rather serious) bug myself and confirmed that it exists in master, and as far back as 1.4.1. I'd really appreciate if someone could reproduce and confirm on another machine, as so far all my testing has been on our single high-memory machine. Thanks, David On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote: A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit: a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8') b = numpy.random.randint(500,size=(4993210,)) c = a[b] It seems c is not getting filled in full, namely: In [14]: c[100:].sum() Out[14]: 0 I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug. Thanks, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley warde...@iro.umontreal.ca wrote: I've reproduced this (rather serious) bug myself and confirmed that it exists in master, and as far back as 1.4.1. I'd really appreciate if someone could reproduce and confirm on another machine, as so far all my testing has been on our single high-memory machine. I see the same behaviour on a Winodows machine with numpy 1.6.1. But I don't think it is an indexing problem - rather something with the random number creation. a itself is already zeros for high indexes. In [8]: b[100:110] Out[8]: array([3429029, 1251819, 4292918, 2249483, 757620, 3977130, 3455449, 2005054, 2565207, 3114930]) In [9]: a[b[100:110]] Out[9]: array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=uint8) In [41]: a[581350:,0].sum() Out[41]: 0 Cheers Robin Thanks, David On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote: A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit: a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8') b = numpy.random.randint(500,size=(4993210,)) c = a[b] It seems c is not getting filled in full, namely: In [14]: c[100:].sum() Out[14]: 0 I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug. Thanks, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Mon, Jan 23, 2012 at 1:33 PM, Travis Oliphant teoliph...@gmail.comwrote: Can you determine where the problem is, precisely.In other words, can you verify that c is not getting filled in correctly? You are no doubt going to get overflow in the summation as you have a uint8 parameter. But, having that overflow be exactly '0' would be surprising. Can you verify that a and b are getting created correctly? Also, 'c' should be a 2-d array, can you verify that? Can you take the sum along the -1 axis and the 0 axis separately: print a.shape print b.shape print c.shape c[100:].sum(axis=0) d = c[100:].sum(axis=-1) print d[:100] print d[-100:] I am getting the same results as David. It looks like c just stopped filling in partway through the array. I don't think there is any overflow issue, since the result of sum() is up-promoted to uint64 when I do that. Travis, here are the outputs at my end - I cut out many zeros for brevity: In [7]: print a.shape (500, 972) In [8]: print b.shape (4993210,) In [9]: print c.shape (4993210, 972) In [10]: c[100:].sum(axis=0) Out[10]: array([0, 0, 0, , 0]) In [11]: d = c[100:].sum(axis=-1) In [12]: print d[:100] [0 0 0 ... 0 0] In [13]: print d[-100:] [0 0 0 ... 0 0 0] I looked at sparse subsamples with matplotlib - specifically, imshow(a[::1000, :]) - and the a array looks correct (random values everywhere), but c is zero past a certain row number. In fact, it looks like it becomes zero at row 575419 - I think for all rows in c beyond row 574519, the values will be zero. For lower row numbers, I think they are correctly filled (at least, by the sparse view in matplotlib). In [15]: a[b[574519], 350:360] Out[15]: array([143, 155, 11, 30, 212, 149, 110, 164, 165, 120], dtype=uint8) In [16]: c[574519, 350:360] Out[16]: array([143, 155, 11, 30, 212, 149, 0, 0, 0, 0], dtype=uint8) I'm using EPD 7.1, numpy 1.6.1, Linux installation (I don't know the kernel details) HTH, Aronne ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
Hi Travis, Thanks for your reply. On Mon, Jan 23, 2012 at 01:33:42PM -0600, Travis Oliphant wrote: Can you determine where the problem is, precisely.In other words, can you verify that c is not getting filled in correctly? You are no doubt going to get overflow in the summation as you have a uint8 parameter. But, having that overflow be exactly '0' would be surprising. I've already looked at this actually. The last 440 or so rows of c are all zero, however 'a' seems to be filled in fine: import numpy a = numpy.array(numpy.random.randint(256,size=(500,972)), dtype=numpy.uint8) b = numpy.random.randint(500,size=(4993210,)) c = a[b] print c [[186 215 204 ..., 170 98 198] [ 56 98 112 ..., 32 233 1] [ 44 133 171 ..., 163 35 51] ..., [ 0 0 0 ..., 0 0 0] [ 0 0 0 ..., 0 0 0] [ 0 0 0 ..., 0 0 0]] print a [[ 30 182 56 ..., 133 162 173] [112 100 69 ..., 3 147 80] [124 70 232 ..., 114 177 11] ..., [ 22 42 31 ..., 141 196 134] [ 74 47 167 ..., 38 193 9] [162 228 190 ..., 150 18 1]] So it seems to have nothing to do with the sum, but rather the advanced indexing operation. The zeros seem to start in the middle of row 574519, in particular at element 356. This is reproducible with different random vectors of indices, it seems. So 558432824th element things go awry. I can't say it makes any sense to me why this would be the magic number. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On Mon, Jan 23, 2012 at 08:38:44PM +0100, Robin wrote: On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley warde...@iro.umontreal.ca wrote: I've reproduced this (rather serious) bug myself and confirmed that it exists in master, and as far back as 1.4.1. I'd really appreciate if someone could reproduce and confirm on another machine, as so far all my testing has been on our single high-memory machine. I see the same behaviour on a Winodows machine with numpy 1.6.1. But I don't think it is an indexing problem - rather something with the random number creation. a itself is already zeros for high indexes. In [8]: b[100:110] Out[8]: array([3429029, 1251819, 4292918, 2249483, 757620, 3977130, 3455449, 2005054, 2565207, 3114930]) In [9]: a[b[100:110]] Out[9]: array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=uint8) In [41]: a[581350:,0].sum() Out[41]: 0 Hmm, this seems like a separate bug to mine. In mine, 'a' is indeed being filled in -- the problem arises with c alone. So, another Windows-specific bug to add to the pile, perhaps? :( David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
On 1/23/2012 12:33 PM, David Warde-Farley wrote: On Mon, Jan 23, 2012 at 08:38:44PM +0100, Robin wrote: On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley warde...@iro.umontreal.ca wrote: I've reproduced this (rather serious) bug myself and confirmed that it exists in master, and as far back as 1.4.1. I'd really appreciate if someone could reproduce and confirm on another machine, as so far all my testing has been on our single high-memory machine. I see the same behaviour on a Winodows machine with numpy 1.6.1. But I don't think it is an indexing problem - rather something with the random number creation. a itself is already zeros for high indexes. In [8]: b[100:110] Out[8]: array([3429029, 1251819, 4292918, 2249483, 757620, 3977130, 3455449, 2005054, 2565207, 3114930]) In [9]: a[b[100:110]] Out[9]: array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=uint8) In [41]: a[581350:,0].sum() Out[41]: 0 Hmm, this seems like a separate bug to mine. In mine, 'a' is indeed being filled in -- the problem arises with c alone. So, another Windows-specific bug to add to the pile, perhaps? :( David Maybe this explains the win-amd64 behavior: There are a couple of places in mtrand where array indices and sizes are C long instead of npy_intp, for example in the randint function: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863 Christoph ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
Den 23.01.2012 22:08, skrev Christoph Gohlke: Maybe this explains the win-amd64 behavior: There are a couple of places in mtrand where array indices and sizes are C long instead of npy_intp, for example in the randint function: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863 AFAIK, on AMD64 a C long is 64 bit on Linux (gcc) and 32 bit on Windows (gcc and MSVC). Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
Den 23.01.2012 22:08, skrev Christoph Gohlke: Maybe this explains the win-amd64 behavior: There are a couple of places in mtrand where array indices and sizes are C long instead of npy_intp, for example in the randint function: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863 Both i and length could overflow here. It should overflow on allocation of more than 2 GB. There is also a lot of C longs in the internal state (line 55-105), as well as the other functions. Producing 2 GB of random ints twice fails: import numpy as np np.random.randint(500,size=(2*1024**3,)) array([0, 0, 0, ..., 0, 0, 0]) np.random.randint(500,size=(2*1024**3,)) Traceback (most recent call last): File pyshell#3, line 1, in module np.random.randint(500,size=(2*1024**3,)) File mtrand.pyx, line 881, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:6040) MemoryError Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] advanced indexing bug with huge arrays?
Den 24.01.2012 06:00, skrev Sturla Molden: Both i and length could overflow here. It should overflow on allocation of more than 2 GB. There is also a lot of C longs in the internal state (line 55-105), as well as the other functions. The use of C long affects all the C and Pyrex source code in mtrand module, not just mtrand.pyx. All of it is fubar on Win64. From the C standard, a C long is only quarranteed to be at least 32 bits wide. Thus a C long can only be expected to index up to 2**31 - 1, and it is not a Windows specific problem. So it seems there are hundreds of places in the mtrand module where integers can overflow on 64-bit Python. Also the crappy old Pyrex code should be updated to some more recent Cython. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion