Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Sturla Molden
Den 24.01.2012 06:00, skrev Sturla Molden:
> Both i and length could overflow here. It should overflow on 
> allocation of more than 2 GB. There is also a lot of C longs in the 
> internal state (line 55-105), as well as the other functions.

The use of C long affects all the C and Pyrex source code in mtrand 
module, not just mtrand.pyx. All of it is fubar on Win64.

 From the C standard, a C long is only quarranteed to be "at least 32 
bits wide".  Thus a C long can only be expected to index up to 2**31 - 
1, and it is not a Windows specific problem.

So it seems there are hundreds of places in the mtrand module where 
integers can overflow on 64-bit Python.

Also the crappy old Pyrex code should be updated to some more recent Cython.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Sturla Molden
Den 23.01.2012 22:08, skrev Christoph Gohlke:
>
> Maybe this explains the win-amd64 behavior: There are a couple of places
> in mtrand where array indices and sizes are C long instead of npy_intp,
> for example in the randint function:
>
> 
>
>

Both i and length could overflow here. It should overflow on allocation 
of more than 2 GB.

There is also a lot of C longs in the internal state (line 55-105), as 
well as the other functions.

Producing 2 GB of random ints twice fails:

 >>> import numpy as np
 >>> np.random.randint(500,size=(2*1024**3,))
array([0, 0, 0, ..., 0, 0, 0])
 >>> np.random.randint(500,size=(2*1024**3,))

Traceback (most recent call last):
   File "", line 1, in 
 np.random.randint(500,size=(2*1024**3,))
   File "mtrand.pyx", line 881, in mtrand.RandomState.randint 
(numpy\random\mtrand\mtrand.c:6040)
MemoryError
 >>>


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Sturla Molden
Den 23.01.2012 22:08, skrev Christoph Gohlke:
> Maybe this explains the win-amd64 behavior: There are a couple of places
> in mtrand where array indices and sizes are C long instead of npy_intp,
> for example in the randint function:
>
> 
>
>

AFAIK, on AMD64 a C long is 64 bit on Linux (gcc) and 32 bit on Windows 
(gcc and MSVC).

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 'Advanced' save and restore operation

2012-01-23 Thread Derek Homeier
On 24 Jan 2012, at 01:45, Olivier Delalleau wrote:

> Note sure if there's a better way, but you can do it with some custom load 
> and save functions:
> 
> >>> with open('f.txt', 'w') as f:
> ... f.write(str(x.dtype) + '\n')
> ... numpy.savetxt(f, x)
> 
> >>> with open('f.txt') as f:
> ... dtype = f.readline().strip()
> ... y = numpy.loadtxt(f).astype(dtype)
> 
> I'm not sure how that'd work with structured arrays though. For the dict of 
> parameters you'd have to write your own load/save piece of code too if you 
> need a clean text file.
> 
> -=- Olivier
> 
> 2012/1/23 Emmanuel Mayssat 
> After having saved data, I need to know/remember the data dtype to
> restore it correctly.
> Is there a way to save the dtype with the data?
> (I guess the header parameter of savedata could help, but they are
> only available in v2.0+ )
> 
> I would like to save several related structured array and a dictionary
> of parameters into a TEXT file.
> Is there an easy way to do that?
> (maybe xml file, or maybe archive zip file of other files, or . )
> 
> Any recommendation is helpful.

asciitable might be of some help, but to implement all of your required 
functionality, 
you'd probably still have to implement your own Reader class:

http://cxc.cfa.harvard.edu/contrib/asciitable/

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 'Advanced' save and restore operation

2012-01-23 Thread Olivier Delalleau
Note sure if there's a better way, but you can do it with some custom load
and save functions:

>>> with open('f.txt', 'w') as f:
... f.write(str(x.dtype) + '\n')
... numpy.savetxt(f, x)

>>> with open('f.txt') as f:
... dtype = f.readline().strip()
... y = numpy.loadtxt(f).astype(dtype)

I'm not sure how that'd work with structured arrays though. For the dict of
parameters you'd have to write your own load/save piece of code too if you
need a clean text file.

-=- Olivier

2012/1/23 Emmanuel Mayssat 

> After having saved data, I need to know/remember the data dtype to
> restore it correctly.
> Is there a way to save the dtype with the data?
> (I guess the header parameter of savedata could help, but they are
> only available in v2.0+ )
>
> I would like to save several related structured array and a dictionary
> of parameters into a TEXT file.
> Is there an easy way to do that?
> (maybe xml file, or maybe archive zip file of other files, or . )
>
> Any recommendation is helpful.
>
> Regards,
> --
> Emmanuel
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Working with MATLAB

2012-01-23 Thread Jaidev Deshpande
Please ignore my question. I found what I needed on the scipy website.

I asked the question in haste.

I'm sorry.

Thanks
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Working with MATLAB

2012-01-23 Thread Jaidev Deshpande
Dear List,

I frequently work with MATLAB and it is necessary for me many a times
to adapt MATLAB codes for NumPy arrays.

While for most practical purposes it works fine, I think there might
be a lot of 'under the hood' things that I might be missing when I
make the translations from MATLAB to Python.

Are there any 'best practices' for working on this transition?

Thanks
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] 'Advanced' save and restore operation

2012-01-23 Thread Emmanuel Mayssat
After having saved data, I need to know/remember the data dtype to
restore it correctly.
Is there a way to save the dtype with the data?
(I guess the header parameter of savedata could help, but they are
only available in v2.0+ )

I would like to save several related structured array and a dictionary
of parameters into a TEXT file.
Is there an easy way to do that?
(maybe xml file, or maybe archive zip file of other files, or . )

Any recommendation is helpful.

Regards,
--
Emmanuel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Saving and loading a structured array from a TEXT file

2012-01-23 Thread Derek Homeier
On 23 Jan 2012, at 22:07, Derek Homeier wrote:

>> In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', '> '> 
>> In [5]: r.tofile('toto.txt',sep='\n')
>> 
>> bash-4.2$ cat toto.txt
>> ('1', 1, 1.0)
>> ('1', 1, 1.0)
>> ('1', 1, 1.0)
>> 
> 
> cnv =  {0: lambda s: s.lstrip('('), -1: lambda s: s.rstrip(')')}
> r2 = np.loadtxt('toto.txt', delimiter=',', converters=cnv, dtype=r.dtype)
> 
> Generally loadtxt works more smoothly together with savetxt, but the latter 
> unfortunately 
> does not offer an easy way to save structured arrays (note to self and others 
> currently 
> working on npyio: definitely room for improvement!).

For the record, in that example

np.savetxt('toto.txt', r, fmt='%s,%d,%f')

would work as well, saving you the custom converter for loadtxt - it could just 
become tedious 
to work out the format for more complex structures, so an option to construct 
this automatically 
from r.dtype could certainly be a nice enhancement. 
Just wondering, is there something like the inverse operator to 
np.format_parser, i.e. 
mapping each dtype to a default print format specifier?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Christoph Gohlke


On 1/23/2012 12:33 PM, David Warde-Farley wrote:
> On Mon, Jan 23, 2012 at 08:38:44PM +0100, Robin wrote:
>> On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley
>>   wrote:
>>> I've reproduced this (rather serious) bug myself and confirmed that it 
>>> exists
>>> in master, and as far back as 1.4.1.
>>>
>>> I'd really appreciate if someone could reproduce and confirm on another
>>> machine, as so far all my testing has been on our single high-memory 
>>> machine.
>>
>> I see the same behaviour on a Winodows machine with numpy 1.6.1. But I
>> don't think it is an indexing problem - rather something with the
>> random number creation. a itself is already zeros for high indexes.
>> 
>> In [8]: b[100:110]
>> Out[8]:
>> array([3429029, 1251819, 4292918, 2249483,  757620, 3977130, 3455449,
>> 2005054, 2565207, 3114930])
>>
>> In [9]: a[b[100:110]]
>> Out[9]:
>> array([[0, 0, 0, ..., 0, 0, 0],
>> [0, 0, 0, ..., 0, 0, 0],
>> [0, 0, 0, ..., 0, 0, 0],
>> ...,
>> [0, 0, 0, ..., 0, 0, 0],
>> [0, 0, 0, ..., 0, 0, 0],
>> [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
>>
>> In [41]: a[581350:,0].sum()
>> Out[41]: 0
>
> Hmm, this seems like a separate bug to mine. In mine, 'a' is indeed being
> filled in -- the problem arises with c alone.
>
> So, another Windows-specific bug to add to the pile, perhaps? :(
>
> David


Maybe this explains the win-amd64 behavior: There are a couple of places 
in mtrand where array indices and sizes are C long instead of npy_intp, 
for example in the randint function:



Christoph
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Saving and loading a structured array from a TEXT file

2012-01-23 Thread Derek Homeier
On 23 Jan 2012, at 21:15, Emmanuel Mayssat wrote:

> Is there a way to save a structured array in a text file?
> My problem is not so much in the saving procedure, but rather in the
> 'reloading' procedure.
> See below
> 
> 
> In [3]: import numpy as np
> 
> In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', ' 
> In [5]: r.tofile('toto.txt',sep='\n')
> 
> bash-4.2$ cat toto.txt
> ('1', 1, 1.0)
> ('1', 1, 1.0)
> ('1', 1, 1.0)
> 
> In [7]: r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype)
> ---
> ValueErrorTraceback (most recent call last)
> /home/cls1fs/clseng/10/ in ()
> > 1 r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype)
> 
> ValueError: Unable to read character files of that array type

I think most of the np.fromfile functionality works for binary input; for 
reading text 
input np.loadtxt and np.genfromtxt are the (currently) recommended functions. 
It is bit tricky to read the format generated by tofile() in the above example, 
but 
the following should work:

cnv =  {0: lambda s: s.lstrip('('), -1: lambda s: s.rstrip(')')}
r2 = np.loadtxt('toto.txt', delimiter=',', converters=cnv, dtype=r.dtype)

Generally loadtxt works more smoothly together with savetxt, but the latter 
unfortunately 
does not offer an easy way to save structured arrays (note to self and others 
currently 
working on npyio: definitely room for improvement!).

HTH,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread David Warde-Farley
On Mon, Jan 23, 2012 at 08:38:44PM +0100, Robin wrote:
> On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley
>  wrote:
> > I've reproduced this (rather serious) bug myself and confirmed that it 
> > exists
> > in master, and as far back as 1.4.1.
> >
> > I'd really appreciate if someone could reproduce and confirm on another
> > machine, as so far all my testing has been on our single high-memory 
> > machine.
> 
> I see the same behaviour on a Winodows machine with numpy 1.6.1. But I
> don't think it is an indexing problem - rather something with the
> random number creation. a itself is already zeros for high indexes.
> 
> In [8]: b[100:110]
> Out[8]:
> array([3429029, 1251819, 4292918, 2249483,  757620, 3977130, 3455449,
>2005054, 2565207, 3114930])
> 
> In [9]: a[b[100:110]]
> Out[9]:
> array([[0, 0, 0, ..., 0, 0, 0],
>[0, 0, 0, ..., 0, 0, 0],
>[0, 0, 0, ..., 0, 0, 0],
>...,
>[0, 0, 0, ..., 0, 0, 0],
>[0, 0, 0, ..., 0, 0, 0],
>[0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
> 
> In [41]: a[581350:,0].sum()
> Out[41]: 0

Hmm, this seems like a separate bug to mine. In mine, 'a' is indeed being
filled in -- the problem arises with c alone. 

So, another Windows-specific bug to add to the pile, perhaps? :(

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread David Warde-Farley
Hi Travis,

Thanks for your reply.

On Mon, Jan 23, 2012 at 01:33:42PM -0600, Travis Oliphant wrote:
> Can you determine where the problem is, precisely.In other words, can you 
> verify that c is not getting filled in correctly? 
> 
> You are no doubt going to get overflow in the summation as you have a uint8 
> parameter.   But, having that overflow be exactly '0' would be surprising.  

I've already looked at this actually. The last 440 or so rows of c are
all zero, however 'a' seems to be filled in fine:

>>> import numpy
>>> a = numpy.array(numpy.random.randint(256,size=(500,972)),
>>> dtype=numpy.uint8)
>>> b = numpy.random.randint(500,size=(4993210,))
>>> c = a[b]
>>> print c
[[186 215 204 ..., 170  98 198]
 [ 56  98 112 ...,  32 233   1]
 [ 44 133 171 ..., 163  35  51]
 ..., 
 [  0   0   0 ...,   0   0   0]
 [  0   0   0 ...,   0   0   0]
 [  0   0   0 ...,   0   0   0]]
>>> print a
[[ 30 182  56 ..., 133 162 173]
 [112 100  69 ...,   3 147  80]
 [124  70 232 ..., 114 177  11]
 ..., 
 [ 22  42  31 ..., 141 196 134]
 [ 74  47 167 ...,  38 193   9]
 [162 228 190 ..., 150  18   1]]

So it seems to have nothing to do with the sum, but rather the advanced
indexing operation. The zeros seem to start in the middle of row 574519,
in particular at element 356. This is reproducible with different random
vectors of indices, it seems.

So 558432824th element things go awry. I can't say it makes any sense to
me why this would be the magic number.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Saving and loading a structured array from a TEXT file

2012-01-23 Thread Emmanuel Mayssat
Is there a way to save a structured array in a text file?
My problem is not so much in the saving procedure, but rather in the
'reloading' procedure.
See below


In [3]: import numpy as np

In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', ' in ()
> 1 r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype)

ValueError: Unable to read character files of that array type


--
Emmanuel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Aronne Merrelli
On Mon, Jan 23, 2012 at 1:33 PM, Travis Oliphant wrote:

> Can you determine where the problem is, precisely.In other words, can
> you verify that c is not getting filled in correctly?
>
> You are no doubt going to get overflow in the summation as you have a
> uint8 parameter.   But, having that overflow be exactly '0' would be
> surprising.
>
> Can you verify that a and b are getting created correctly?   Also, 'c'
> should be a 2-d array, can you verify that?  Can you take the sum along the
> -1 axis and the 0 axis separately:
>
> print a.shape
> print b.shape
> print c.shape
>
> c[100:].sum(axis=0)
> d = c[100:].sum(axis=-1)
> print d[:100]
> print d[-100:]
>


I am getting the same results as David. It looks like c just "stopped
filling in" partway through the array. I don't think there is any overflow
issue, since the result of sum() is up-promoted to uint64 when I do that.
Travis, here are the outputs at my end - I cut out many zeros for brevity:

In [7]: print a.shape
(500, 972)
In [8]: print b.shape
(4993210,)
In [9]: print c.shape
(4993210, 972)

In [10]: c[100:].sum(axis=0)
Out[10]:
array([0, 0, 0,  , 0])

In [11]: d = c[100:].sum(axis=-1)

In [12]: print d[:100]
[0 0 0 ... 0 0]

In [13]: print d[-100:]
[0 0 0 ... 0 0 0]

I looked at sparse subsamples with matplotlib - specifically,
imshow(a[::1000, :]) - and the a array looks correct (random values
everywhere), but c is zero past a certain row number. In fact, it looks
like it becomes zero at row 575419 - I think for all rows in c beyond row
574519, the values will be zero. For lower row numbers, I think they are
correctly filled (at least, by the sparse view in matplotlib).

In [15]: a[b[574519], 350:360]
Out[15]: array([143, 155,  11,  30, 212, 149, 110, 164, 165, 120],
dtype=uint8)

In [16]: c[574519, 350:360]
Out[16]: array([143, 155,  11,  30, 212, 149,   0,   0,   0,   0],
dtype=uint8)


I'm using EPD 7.1, numpy 1.6.1, Linux installation (I don't know the kernel
details)

HTH,
Aronne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Counting the Colors of RGB-Image

2012-01-23 Thread elodw
  Am 23.01.2012 18:17, schrieb Chris Barker:
> On Wed, Jan 18, 2012 at 1:26 AM,  wrote:
>> Your ideas are very helpfull and the code is very fast.
> I'm curios -- a number of ideas were floated here -- what did you end up 
> using?
>
> -Chris
>
>
I'am sorry but  when i see the code of Torgil Svenson,
I think, "the game is over".

I use the follow. code:

t0=clock()

tt = n_im2.view()
tt.shape = -1,3
ifl = tt[...,0].astype(np.int)*256*256 + tt[...,1].astype(np.int)*256 + 
tt[...,2].astype(np.int)
colors, inv = np.unique(ifl,return_inverse=True)

zus = np.array([colors[-1]+1])
colplus = np.hstack((colors,zus))
ccnt = np.histogram(ifl,colplus)[0]

t1=clock()
print (t1-t0)
t0=t1


> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Robin
On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley
 wrote:
> I've reproduced this (rather serious) bug myself and confirmed that it exists
> in master, and as far back as 1.4.1.
>
> I'd really appreciate if someone could reproduce and confirm on another
> machine, as so far all my testing has been on our single high-memory machine.

I see the same behaviour on a Winodows machine with numpy 1.6.1. But I
don't think it is an indexing problem - rather something with the
random number creation. a itself is already zeros for high indexes.

In [8]: b[100:110]
Out[8]:
array([3429029, 1251819, 4292918, 2249483,  757620, 3977130, 3455449,
   2005054, 2565207, 3114930])

In [9]: a[b[100:110]]
Out[9]:
array([[0, 0, 0, ..., 0, 0, 0],
   [0, 0, 0, ..., 0, 0, 0],
   [0, 0, 0, ..., 0, 0, 0],
   ...,
   [0, 0, 0, ..., 0, 0, 0],
   [0, 0, 0, ..., 0, 0, 0],
   [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

In [41]: a[581350:,0].sum()
Out[41]: 0

Cheers

Robin
>
> Thanks,
> David
>
> On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote:
>> A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, 
>> on Linux (Fedora Core 14) 64-bit:
>>
>> > a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8')
>> > b = numpy.random.randint(500,size=(4993210,))
>> > c = a[b]
>>
>> It seems c is not getting filled in full, namely:
>>
>> > In [14]: c[100:].sum()
>> > Out[14]: 0
>>
>> I haven't been able to reproduce this quite yet, I'll try to find a machine 
>> with sufficient memory tomorrow. But does anyone have any insight in the 
>> mean time? It smells like some kind of integer overflow bug.
>>
>> Thanks,
>>
>> David
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread Travis Oliphant
Can you determine where the problem is, precisely.In other words, can you 
verify that c is not getting filled in correctly? 

You are no doubt going to get overflow in the summation as you have a uint8 
parameter.   But, having that overflow be exactly '0' would be surprising.  

Can you verify that a and b are getting created correctly?   Also, 'c' should 
be a 2-d array, can you verify that?  Can you take the sum along the -1 axis 
and the 0 axis separately: 

print a.shape
print b.shape
print c.shape

c[100:].sum(axis=0)
d = c[100:].sum(axis=-1)
print d[:100]
print d[-100:]



On Jan 23, 2012, at 12:55 PM, David Warde-Farley wrote:

> I've reproduced this (rather serious) bug myself and confirmed that it exists
> in master, and as far back as 1.4.1.
> 
> I'd really appreciate if someone could reproduce and confirm on another
> machine, as so far all my testing has been on our single high-memory machine.
> 
> Thanks,
> David
> 
> On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote:
>> A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, 
>> on Linux (Fedora Core 14) 64-bit:
>> 
>>> a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8')
>>> b = numpy.random.randint(500,size=(4993210,))
>>> c = a[b]
>> 
>> It seems c is not getting filled in full, namely:
>> 
>>> In [14]: c[100:].sum()
>>> Out[14]: 0
>> 
>> I haven't been able to reproduce this quite yet, I'll try to find a machine 
>> with sufficient memory tomorrow. But does anyone have any insight in the 
>> mean time? It smells like some kind of integer overflow bug.
>> 
>> Thanks,
>> 
>> David
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread David Warde-Farley
I've reproduced this (rather serious) bug myself and confirmed that it exists
in master, and as far back as 1.4.1.

I'd really appreciate if someone could reproduce and confirm on another
machine, as so far all my testing has been on our single high-memory machine.

Thanks,
David

On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote:
> A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on 
> Linux (Fedora Core 14) 64-bit:
> 
> > a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8')
> > b = numpy.random.randint(500,size=(4993210,))
> > c = a[b]
> 
> It seems c is not getting filled in full, namely:
> 
> > In [14]: c[100:].sum()
> > Out[14]: 0
> 
> I haven't been able to reproduce this quite yet, I'll try to find a machine 
> with sufficient memory tomorrow. But does anyone have any insight in the mean 
> time? It smells like some kind of integer overflow bug.
> 
> Thanks,
> 
> David
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Counting the Colors of RGB-Image

2012-01-23 Thread Chris Barker
On Wed, Jan 18, 2012 at 1:26 AM,   wrote:
> Your ideas are very helpfull and the code is very fast.

I'm curios -- a number of ideas were floated here -- what did you end up using?

-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Samuel John
I'd like to add 
http://git.tiker.net/pyopencl.git/blob/HEAD:/examples/demo_mandelbrot.py to the 
discussion, since I use pyopencl  (http://mathema.tician.de/software/pyopencl) 
with great success in my daily scientific computing. Install with pip.

PyOpenCL does understand numpy arrays. You write a kernel (small c-program) 
directly into a python triple quoted strings and get a pythonic way to program 
GPU and core i5 and i7 CPUs with python Exception if something goes wrong. 
Whenever I hit a speed bottleneck that I cannot solve with pure numpy, I code a 
little part of the computation for GPU. The compilation is done just in time 
when you run the python code.

Especially for the mandelbrot this may be a _huge_ gain in speed since its 
embarrassingly parallel.

Samuel


On 23.01.2012, at 14:02, Robert Cimrman wrote:

> On 01/23/12 13:51, Sturla Molden wrote:
>> Den 23.01.2012 13:09, skrev Sebastian Haase:
>>> 
>>> I would think that interactive zooming would be quite nice
>>> ("illuminating")   and for that 13 secs would not be tolerable
>>> Well... it's not at the top of my priority list ... ;-)
>>> 
>> 
>> Sure, that comes under the 'fast enough' issue. But even Fortran might
>> be too slow here?
>> 
>> For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader
>> (which would be a text string in Python):
>> 
>> madelbrot_fragment_shader = """
>> 
>> uniform sampler1D tex;
>> uniform vec2 center;
>> uniform float scale;
>> uniform int iter;
>> void main() {
>>  vec2 z, c;
>>  c.x = 1. * (gl_TexCoord[0].x - 0.5) * scale - center.x;
>>  c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;
>>  int i;
>>  z = c;
>>  for(i=0; i>  float x = (z.x * z.x - z.y * z.y) + c.x;
>>  float y = (z.y * z.x + z.x * z.y) + c.y;
>>  if((x * x + y * y)>   4.0) break;
>>  z.x = x;
>>  z.y = y;
>>  }
>>  gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);
>> }
>> 
>> """
>> 
>> The rest is just boiler-plate OpenGL...
>> 
>> Sources:
>> 
>> http://nuclear.mutantstargoat.com/articles/sdr_fract/
>> 
>> http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml
> 
> Off-topic comment: Or use some algorithmic cleverness, see [1]. I recall Xaos 
> had interactive, extremely fast a fluid fractal zooming more than 10 (or 15?) 
> years ago (-> on a laughable hardware by today's standards).
> 
> r.
> 
> [1] http://wmi.math.u-szeged.hu/xaos/doku.php
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Robert Cimrman
On 01/23/12 13:51, Sturla Molden wrote:
> Den 23.01.2012 13:09, skrev Sebastian Haase:
>>
>> I would think that interactive zooming would be quite nice
>> ("illuminating")   and for that 13 secs would not be tolerable
>> Well... it's not at the top of my priority list ... ;-)
>>
>
> Sure, that comes under the 'fast enough' issue. But even Fortran might
> be too slow here?
>
> For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader
> (which would be a text string in Python):
>
> madelbrot_fragment_shader = """
>
> uniform sampler1D tex;
> uniform vec2 center;
> uniform float scale;
> uniform int iter;
> void main() {
>   vec2 z, c;
>   c.x = 1. * (gl_TexCoord[0].x - 0.5) * scale - center.x;
>   c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;
>   int i;
>   z = c;
>   for(i=0; i   float x = (z.x * z.x - z.y * z.y) + c.x;
>   float y = (z.y * z.x + z.x * z.y) + c.y;
>   if((x * x + y * y)>   4.0) break;
>   z.x = x;
>   z.y = y;
>   }
>   gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);
> }
>
> """
>
> The rest is just boiler-plate OpenGL...
>
> Sources:
>
> http://nuclear.mutantstargoat.com/articles/sdr_fract/
>
> http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml

Off-topic comment: Or use some algorithmic cleverness, see [1]. I recall Xaos 
had interactive, extremely fast a fluid fractal zooming more than 10 (or 15?) 
years ago (-> on a laughable hardware by today's standards).

r.

[1] http://wmi.math.u-szeged.hu/xaos/doku.php
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Sturla Molden
Den 23.01.2012 13:09, skrev Sebastian Haase:
>
> I would think that interactive zooming would be quite nice
> ("illuminating")   and for that 13 secs would not be tolerable
> Well... it's not at the top of my priority list ... ;-)
>

Sure, that comes under the 'fast enough' issue. But even Fortran might 
be too slow here?

For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader 
(which would be a text string in Python):

madelbrot_fragment_shader = """

uniform sampler1D tex;
uniform vec2 center;
uniform float scale;
uniform int iter;
void main() {
 vec2 z, c;
 c.x = 1. * (gl_TexCoord[0].x - 0.5) * scale - center.x;
 c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;
 int i;
 z = c;
 for(i=0; i  4.0) break;
 z.x = x;
 z.y = y;
 }
 gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);
}

"""

The rest is just boiler-plate OpenGL...

Sources:

http://nuclear.mutantstargoat.com/articles/sdr_fract/

http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Dag Sverre Seljebotn
On 01/23/2012 12:23 PM, Sturla Molden wrote:
> Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn:
>> On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
>>> Hi all,
>>>
>>> I was reading this while learning about Pytables in more details and the
>>> origin of its efficiency. This sounds like a problem where out of core
>>> computation using pytables would shine since the dataset doesn't fit
>>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
>>> C/Cythonizing the problem would be another good way...
>> Well, since the data certainly fits in RAM, one would use numexpr
>> directly (which is what pytables also uses).
>>
>>
>
> Personally I feel this debate is asking the wrong question.
>
> It is not uncommon for NumPy code to be 16x slower than C or Fortran.
> But that is not really interesting.
>
> This is what I think matters:
>
> - Is the NumPy code FAST ENOUGH?  If not, then go ahead and optimize. If
> it's fast enough, then just leave it.
>
> In this case, it seems Python takes ~13 seconds compared to ~1 second
> for Fortran. Sure, those extra 12 seconds could be annoying. But how
> much coding time should we spend to avoid them? 15 minutes? An hour? Two
> hours?
>
> Taking the time spent optimizing into account, then perhaps Python is
> 'faster' anyway? It is common to ask what is fastest for the computer.
> But we should really be asking what is fastest for our selves.
>
> For example: I have a computation that will take a day in Fortran or a
> month in Python (estimated). And I am going to run this code several
> times (20 or so, I think). In this case, yes, coding the bottlenecks in
> Fortran matters to me. But 13 seconds versus 1 second? I find that
> hardly interesting.

You, me, Ondrej, and many more are happy to learn 4 languages and use 
them where they are most appropriate.

But most scientists only want to learn and use one tool. And most 
scientists have both problems where performance doesn't matter, and 
problems where it does. So as long as examples like this exists, many 
people will prefer Fortran for *all* their tasks.

(Of course, that's why I got involved in Cython...)

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Sebastian Haase
On Mon, Jan 23, 2012 at 12:23 PM, Sturla Molden  wrote:
> Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn:
>> On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
>>> Hi all,
>>>
>>> I was reading this while learning about Pytables in more details and the
>>> origin of its efficiency. This sounds like a problem where out of core
>>> computation using pytables would shine since the dataset doesn't fit
>>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
>>> C/Cythonizing the problem would be another good way...
>> Well, since the data certainly fits in RAM, one would use numexpr
>> directly (which is what pytables also uses).
>>
>>
>
> Personally I feel this debate is asking the wrong question.
>
> It is not uncommon for NumPy code to be 16x slower than C or Fortran.
> But that is not really interesting.
>
> This is what I think matters:
>
> - Is the NumPy code FAST ENOUGH?  If not, then go ahead and optimize. If
> it's fast enough, then just leave it.
>
> In this case, it seems Python takes ~13 seconds compared to ~1 second
> for Fortran. Sure, those extra 12 seconds could be annoying. But how
> much coding time should we spend to avoid them? 15 minutes? An hour? Two
> hours?
>
> Taking the time spent optimizing into account, then perhaps Python is
> 'faster' anyway? It is common to ask what is fastest for the computer.
> But we should really be asking what is fastest for our selves.
>
> For example: I have a computation that will take a day in Fortran or a
> month in Python (estimated). And I am going to run this code several
> times (20 or so, I think). In this case, yes, coding the bottlenecks in
> Fortran matters to me. But 13 seconds versus 1 second? I find that
> hardly interesting.
>
> Sturla


I would think that interactive zooming would be quite nice
("illuminating")   and for that 13 secs would not be tolerable
Well... it's not at the top of my priority list ... ;-)

-Sebastian Haase
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Sturla Molden
Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn:
> On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
>> Hi all,
>>
>> I was reading this while learning about Pytables in more details and the
>> origin of its efficiency. This sounds like a problem where out of core
>> computation using pytables would shine since the dataset doesn't fit
>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
>> C/Cythonizing the problem would be another good way...
> Well, since the data certainly fits in RAM, one would use numexpr
> directly (which is what pytables also uses).
>
>

Personally I feel this debate is asking the wrong question.

It is not uncommon for NumPy code to be 16x slower than C or Fortran. 
But that is not really interesting.

This is what I think matters:

- Is the NumPy code FAST ENOUGH?  If not, then go ahead and optimize. If 
it's fast enough, then just leave it.

In this case, it seems Python takes ~13 seconds compared to ~1 second 
for Fortran. Sure, those extra 12 seconds could be annoying. But how 
much coding time should we spend to avoid them? 15 minutes? An hour? Two 
hours?

Taking the time spent optimizing into account, then perhaps Python is 
'faster' anyway? It is common to ask what is fastest for the computer. 
But we should really be asking what is fastest for our selves.

For example: I have a computation that will take a day in Fortran or a 
month in Python (estimated). And I am going to run this code several 
times (20 or so, I think). In this case, yes, coding the bottlenecks in 
Fortran matters to me. But 13 seconds versus 1 second? I find that 
hardly interesting.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-23 Thread David Warde-Farley
A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on 
Linux (Fedora Core 14) 64-bit:

> a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8')
> b = numpy.random.randint(500,size=(4993210,))
> c = a[b]

It seems c is not getting filled in full, namely:

> In [14]: c[100:].sum()
> Out[14]: 0

I haven't been able to reproduce this quite yet, I'll try to find a machine 
with sufficient memory tomorrow. But does anyone have any insight in the mean 
time? It smells like some kind of integer overflow bug.

Thanks,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Dag Sverre Seljebotn
On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
> Hi all,
>
> I was reading this while learning about Pytables in more details and the
> origin of its efficiency. This sounds like a problem where out of core
> computation using pytables would shine since the dataset doesn't fit
> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
> C/Cythonizing the problem would be another good way...

Well, since the data certainly fits in RAM, one would use numexpr 
directly (which is what pytables also uses).

Dag Sverre

>
> HTH,
> Jonathan
>
> 2012/1/22 Ondřej Čertík  >
>
> On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase
> mailto:seb.ha...@gmail.com>> wrote:
>  > How does the algorithm and timing compare to this one:
>  >
>  >
> 
> http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f
> 
> 
>  >
>  > The author of original version is  Dan Goodman
>  > # FAST FRACTALS WITH PYTHON AND NUMPY
>
> Thanks Sebastian. This one is much faster  2.7s on my laptop with
> the same dimensions/iterations.
>
> It uses a better datastructures -- it only keeps track of points that
> still need to be iterated --- very clever.
> If I have time, I'll try to provide an equivalent Fortran version too,
> for comparison.
>
> Ondrej
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org 
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> --
> Jonathan Rocher, PhD
> Scientific software developer
> Enthought, Inc.
> jroc...@enthought.com 
> 1-512-536-1057
> http://www.enthought.com 
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion