Re: [Numpy-discussion] iteration slowing, no increase in memory

2009-09-10 Thread Chad Netzer
On Thu, Sep 10, 2009 at 10:03 AM, John [H2O]  wrote:

> It runs very well for the first few iterations, but then slows tremendously
> - there is nothing significantly different about the files or directory in
> which it slows. I've monitored the memory use, and it is not increasing.

The memory use itself is not a good indicator, as modern operating
systems (Linux, Windows, Mac, et al) generally use all available free
memory as a disk cache.  So the system memory use may remain quite
steady while old data is flushed and new data paged in.  The first few
iterations could be "fast" if they are already in memory, although the
behavior should probably change on repeated runs.

If you reboot, then immediately run the script, is it slow on all
directories?  Or if you can't reboot, can you at least remount the
filesystem (which should flush all the cached data and metadata)?  Or,
for recent Linux kernels:

http://linux-mm.org/Drop_Caches

Are other operations slow/fast for the different directories, such as
tar'ing them up, or "du -s"?  Can you verify the integrity of the
drive with SMART tools?  If its Linux, can you get data on the actual
disk device I/O (using "iostat" or "vmstat")?

Or you could test by iterating over the same directory repeatedly; it
should be fast after the first iteration.  Then move to a "problem"
directory and see if the first iteration only is slow, or if all
iterations are slow.

-C
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] iteration slowing, no increase in memory

2009-09-10 Thread Robert Kern
On Thu, Sep 10, 2009 at 12:03, John [H2O] wrote:
>
> Hello,
>
> I have a routine that is iterating through a series of directories, loading
> files, plotting, then moving on...
>
> It runs very well for the first few iterations, but then slows tremendously
> - there is nothing significantly different about the files or directory in
> which it slows.

One thing you can do to verify this is to change the order of
iteration. You will also want to profile your code. Then you can see
what is taking up so much time.

  http://docs.python.org/library/profile

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] iteration slowing, no increase in memory

2009-09-10 Thread John [H2O]

Hello,

I have a routine that is iterating through a series of directories, loading
files, plotting, then moving on...

It runs very well for the first few iterations, but then slows tremendously
- there is nothing significantly different about the files or directory in
which it slows. I've monitored the memory use, and it is not increasing.
I've looked at what other possible explanations there may be, but I am at a
loss.

Does anyone have suggestions for where to start looking. I recognized
without the code it is difficult, but I don't know that there is any one
'piece' of code to post, and it's problem not of interest for me to post the
entire script here.

Thanks!
-- 
View this message in context: 
http://www.nabble.com/iteration-slowing%2C-no-increase-in-memory-tp25387205p25387205.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> Yes. However, it is worth making the distinction between
> embarrassingly parallel problems and SIMD problems. Not all
> embarrassingly parallel problems are SIMD-capable. GPUs do SIMD, not
> generally embarrassing problems.

GPUs exploit both dimensions of parallelism, both simd (aka
vectorization) and parallelization (aka multicore). And yeah, 99.9% of
the time branching on GPU should be the least/last of your worries if
your problem is data-parallel. There are much worse things than
branchings.

As for SIMD  special functions, branching can certainly be eliminated.
I have written/come across some special functions myself, and I do not
know any case which is difficult to do efficiently on a gpu.
Certainly, I know less than some folks around here. May be you can
contribute a counter example to this discussion.

Regards,

-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Robert Kern
On Thu, Sep 10, 2009 at 07:28, Francesc Alted wrote:
> A Thursday 10 September 2009 11:37:24 Gael Varoquaux escrigué:
>
>> On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote:
>
>> > The point is: are GPUs prepared to compete with a general-purpose CPUs
>
>> > in all-road operations, like evaluating transcendental functions,
>
>> > conditionals all of this with a rich set of data types? I would like to
>
>> > believe that this is the case, but I don't think so (at least not yet).
>
>>
>
>> I believe (this is very foggy) that GPUs can implement non trivial logic
>
>> on there base processing unit, so that conditionals and transcendental
>
>> functions are indeed possible. Where it gets hard is when you don't have
>
>> problems that can be expressed in an embarrassingly parallel manner.
>
> But NumPy is about embarrassingly parallel calculations, right? I mean:
>
> a = np.cos(b)
>
> where b is a 1x1 matrix is *very* embarrassing (in the parallel
> meaning of the term ;-)

Yes. However, it is worth making the distinction between
embarrassingly parallel problems and SIMD problems. Not all
embarrassingly parallel problems are SIMD-capable. GPUs do SIMD, not
generally embarrassing problems. If there are branches, as would be
necessary for many special functions, the GPU does not perform as
well. Basically, every unit has to do both branches because they all
must do the same instruction at the same time, even though the data on
each unit only gets processed by one branch.

cos() is easy. Or at least is so necessary to graphics computing that
it is already a primitive in all (most?) GPU languages. Googling
around shows SIMD code for the basic transcendental functions. I
believe you have to code them differently than you would on a CPU.
Other special functions would simply be hard to do efficiently.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> I think whatever supported by the underlying CPU, whenever it is extended
> double precision (12 bytes) or quad precision (16 bytes).

classic 64 bit cpu's support neither.
>
> --
>
> Francesc Alted
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 15:51:15 Rohit Garg escrigué:
> Apart from float and double, which floating point formats are
> supported by numpy?

I think whatever supported by the underlying CPU, whenever it is extended 
double precision (12 bytes) or quad precision (16 bytes).

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
Apart from float and double, which floating point formats are
supported by numpy?

On Thu, Sep 10, 2009 at 7:09 PM, Bruce Southey  wrote:
> On 09/10/2009 07:40 AM, Francesc Alted wrote:
>
> A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué:
>
>> > That's nice to see. I think I'll change my mind if someone could perform
>
>> > a vector-vector multiplication (a operation that is typically
>
>> > memory-bounded)
>
>>
>
>> You mean a dot product?
>
> Whatever, dot product or element-wise product. Both are memory-bounded.
>
> --
>
> Francesc Alted
>
> As Francesc previous said, these need to be at least in double precision and
> really it should also be in all the floating point precisions used by numpy
> on supported platforms. Based on the various boinc project comments, many
> graphics cards do not natively support double precision so  you can get an
> inflated speedup just because of the difference in precision.
>
> Bruce
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior from a change in dtype?

2009-09-10 Thread Skipper Seabold
On Tue, Sep 8, 2009 at 12:53 PM, Christopher Barker
 wrote:
> Skipper Seabold wrote:
>> Hmm, okay, well I came across this in trying to create a recarray like
>> data2 below, so I guess I should just combine the two questions.
>
> key to understanding this is to understand what is going on under the
> hood in numpy. Travis O. gave a nice intro in an Enthought webcast a few
> months ago -- I"m not sure if those are recorded and up on the web, but
> it's worth a look. It was also discussed int eh advanced numpy tutorial
> at SciPy this year -- and that is up on the web:
>
> http://www.archive.org/details/scipy09_advancedTutorialDay1_1
>

Thanks.  I wasn't able to watch the Enthought webcasts on Linux, but
I've seen a few of the video tutorials.  What a great resource.  I'm
really glad this came together.

>
> Anyway, here is my minimal attempt to clarify:
>
>> import numpy as np
>>
>> data = np.array([[10.75, 1, 1],[10.39, 0, 1],[18.18, 0, 1]])
>
> here we are using a standard array constructor -- it will look at the
> data you are passing in (a mixture of python floats and ints), and
> decide that they can best be represented by a numpy array of float64s.
>
> numpy arrays are essentially a pointer to a black of memory, and a bunch
> of attributes that describe how the bytes pointed to are to be
> interpreted. In this case, they are a 9 C doubles, representing a 3x3
> array of doubles.
>
>> dt = np.dtype([('var1', '
> (NOTE: I'm on a big-endian machine, so I've used:
> dt = np.dtype([('var1', '>f8'), ('var2', '>i8'), ('var3', '>i8')])
> )
>
> This is a data type descriptor that is analogous to a C struct,
> containing a float64 and two int84s
>
>> # Doesn't work, raises TypeError: expected a readable buffer object
>> data2 = data2.view(np.recarray)
>> data2.astype(dt)
>
> I'm don't understand that error either, but recarrays are about adding
> the ability to access parts of a structured array by name, but you still
> need the dtype to specify the types and names. This does seem to work
> (though may not be giving the results you expect):
>
> In [19]: data2 = data.copy()
> In [20]: data2 = data2.view(np.recarray)
> In [21]: data2 = data2.view(dtype=dt)
>
> or, indeed in the opposite order:
>
> In [24]: data2 = data.copy()
> In [25]: data2 = data2.view(dtype=dt)
> In [26]: data2 = data2.view(np.recarray)
>
>
> So you've done two operations, one is to change the dtype -- the
> interpretation of the bytes in the data buffer, and one is to make this
> a recarray, which allows you to access the "fields" by name:
>
> In [31]: data2['var1']
> Out[31]:
> array([[ 10.75],
>        [ 10.39],
>        [ 18.18]])
>
>> # Works without error (?) with unexpected result
>> data3 = data3.view(np.recarray)
>> data3.dtype = dt
>
> that all depends what you expect! I used "view" above, 'cause I think
> there is less magic, though it's the same thing. I suppose changing the
> dtype in place like that is a tiny bit more efficient -- if you use
> .view() , you are creating a new array pointing to the same data, rather
> than changing the array in place.
>
> But anyway, the dtype describes how the bytes in the memory black are to
> be interpreted, changing it by assigning the attribute or using .view()
> changes the interpretation, but does not change the bytes themselves at
> all, so in this case, you are taking the 8 bytes representing a float64
> of value: 1.0, and interpreting those bytes as an 8 byte int -- which is
> going to give you garbage, essentially.
>
>> # One correct (though IMHO) unintuitive way
>> data = np.rec.fromarrays(data.swapaxes(1,0), dtype=dt)
>
> This is using the np.rec.fromarrays constructor to build a new record
> array with the dtype you want, the data is being converted and copied,
> it won't change the original at all:
>
> So the question remains -- is there a way to convert the floats in
> "data" to ints in place?
>

Ah, ok.  I understand roughly the above.  But, yes, this is my question.

>
> This seems to work:
> In [78]: data = np.array([[10.75, 1, 1],[10.39, 0, 1],[18.18, 0, 1]])
>
> In [79]: data[:,1:3] = data[:,1:3].astype('>i8').view(dtype='>f8')
>
> In [80]: data.dtype = dt
>
> It is making a copy of the integer data in process -- but I think that
> is required, as you are changing the value, not just the interpretation
> of the bytes. I suppose we could have a "astype_inplace" method, but
> that would only work if the two types were the same size, and I'm not
> sure it's a common enough use to be worth it.
>
> What is your real use case? I suspect that what you really should do
> here is define your dtype first, then create the array of data:
>

I have a function that eventually appends an ndarray of floats that
are 0 to 1 to a recarray, and I ran into it trying to debug.  Then I
was just curious about the modification in place.

> data = np.array([(10.75, 1, 1), (10.39, 0, 1), (18.18, 0, 1)], dtype=dt)
>
> which does require that you use tuples, rather than lists to hold the
> "structs".
>


Re: [Numpy-discussion] Adding a 2D with a 1D array...

2009-09-10 Thread Ruben Salvador
Well...you are right, sorry, I just thought 'np.shape(offspr)' result would
be enough. Obviously, not!

offspr wasn't actually a numpy array, but a Python list. I'm sorry for the
inconvenience but I didn't realizeI'm just changing my code so that I
just use numpy arrays, and forgot to change offspr definition :S It's always
better not to hurry and check the changes deeper.

I'lll put some time in profiling the code properly later onnow I just
need to finish this! Any pointer where to start a rationale 'sane profiling
techniques'? I love reading details and explanations, but don't have the
time to go through hundreds of pages right now, so...some good trade-off
between practical and extensive docs?

Thanks everybody!

On Thu, Sep 10, 2009 at 2:32 PM, Francesc Alted  wrote:

>  A Thursday 10 September 2009 14:22:57 Dag Sverre Seljebotn escrigué:
>
> > > > (Also a guard in timeit against CPU frequency scaling errors would be
>
> > > >
>
> > > > great :-) Like simply outputting a warning if frequency scaling is
>
> > > >
>
> > > > detected).
>
> > >
>
> > > Sorry, I don't get this one.
>
> >
>
> > I had some trouble getting reliable benchmarks on my own computer until
>
> > I realised that the power-saving capabilities of my CPU down-throttled
>
> > the clock speed when it was not in use. Thus if I did two calls to
>
> > timeit right after one another, the second would always report lower
>
> > runtime, because the first one started at a lower clock speed.
>
> :-) Good point
>
> >
>
> > Changing a BIOS setting solved this, but it might be a gotcha which e.g.
>
> > timeit and IPython could report (they could just inspect the CPU
>
> > information and emit a warning -- or, do something to throttle up the
>
> > CPU to full speed first).
>
> --
>
> Francesc Alted
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Bruce Southey

On 09/10/2009 07:40 AM, Francesc Alted wrote:


A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué:

> > That's nice to see. I think I'll change my mind if someone could 
perform


> > a vector-vector multiplication (a operation that is typically

> > memory-bounded)

>

> You mean a dot product?

Whatever, dot product or element-wise product. Both are memory-bounded.

--

Francesc Alted


As Francesc previous said, these need to be at least in double precision 
and really it should also be in all the floating point precisions used 
by numpy on supported platforms. Based on the various boinc project 
comments, many graphics cards do not natively support double precision 
so  you can get an inflated speedup just because of the difference in 
precision.


Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué:
> > That's nice to see. I think I'll change my mind if someone could perform
> > a vector-vector multiplication (a operation that is typically
> > memory-bounded)
>
> You mean a dot product?

Whatever, dot product or element-wise product.  Both are memory-bounded.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> That's nice to see. I think I'll change my mind if someone could perform a
> vector-vector multiplication (a operation that is typically memory-bounded)

You mean a dot product?

-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> a = np.cos(b)
>
> where b is a 1x1 matrix is *very* embarrassing (in the parallel
> meaning of the term ;-)

On this operation, gpu's will eat up cpu's like a pack of pirhanas. :)

-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a 2D with a 1D array...

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 14:22:57 Dag Sverre Seljebotn escrigué:
> >  > (Also a guard in timeit against CPU frequency scaling errors would be
> >  >
> >  > great :-) Like simply outputting a warning if frequency scaling is
> >  >
> >  > detected).
> >
> > Sorry, I don't get this one.
>
> I had some trouble getting reliable benchmarks on my own computer until
> I realised that the power-saving capabilities of my CPU down-throttled
> the clock speed when it was not in use. Thus if I did two calls to
> timeit right after one another, the second would always report lower
> runtime, because the first one started at a lower clock speed.

:-)  Good point

>
> Changing a BIOS setting solved this, but it might be a gotcha which e.g.
> timeit and IPython could report (they could just inspect the CPU
> information and emit a warning -- or, do something to throttle up the
> CPU to full speed first).

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:37:24 Gael Varoquaux escrigué:
> On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote:
> >The point is: are GPUs prepared to compete with a general-purpose CPUs
> > in all-road operations, like evaluating transcendental functions,
> > conditionals all of this with a rich set of data types? I would like to
> > believe that this is the case, but I don't think so (at least not yet).
>
> I believe (this is very foggy) that GPUs can implement non trivial logic
> on there base processing unit, so that conditionals and transcendental
> functions are indeed possible. Where it gets hard is when you don't have
> problems that can be expressed in an embarrassingly parallel manner.

But NumPy is about embarrassingly parallel calculations, right?  I mean:

a = np.cos(b)

where b is a 1x1 matrix is *very* embarrassing (in the parallel 
meaning of the term ;-)

Anyone here can say how the above operation can be done with GPUs?  (and 
providing some timings would be really great :)

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:40:48 Sturla Molden escrigué:
> Francesc Alted skrev:
> > Numexpr already uses the Python parser, instead of build a new one.
> > However the bytecode emitted after the compilation process is
> > different, of course.
> >
> > Also, I don't see the point in requiring immutable buffers. Could you
> > develop this further?
>
> If you do lacy evaluation, a function like this could fail without
> immutable buffers:
>
> def foobar(x):
> y = a*x[:] + b
> x[0] = 0 # affects y and anything else depending on x
> return y
>
> Immutable buffers are not required, one could document the oddity, but
> coding would be very error-prone.
>

Mmh, I don't see a problem here if operation's order is kept untouched (and 
you normally want to do this).  But I'm not an expert on 'lazy evaluation', so 
may want to ignore my comments better ;-)

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a 2D with a 1D array...

2009-09-10 Thread Dag Sverre Seljebotn
Francesc Alted wrote:
> A Thursday 10 September 2009 13:45:10 Dag Sverre Seljebotn escrigué:
>  > Do you see any issues with this approach: Add a flag timeit to provide
> 
>  > two modes:
> 
>  >
> 
>  > a) Do an initial run which is always not included in timings (in fact,
> 
>  > as it gets "min" and not "mean", I think this is the current behaviour)
> 
> Yup, you are right, it is 'min'. In fact, this is why timeit normally 
> 'forgets' about data transmission times (with a 'mean' the effect is 
> very similar anyways).
> 
>  > b) Do something else between every run which should clear out the cache
> 
>  > (like, just do another big dummy calculation).
> 
> Yeah. In fact, you can simulate this behaviour by running two instances 
> of timeit: one with your code + big dummy calculation, and the other 
> with just the big dummy calculation. Subtract both numbers and you will 
> have a better guess for non-cached calculations.
> 
>  >
> 
>  > (Also a guard in timeit against CPU frequency scaling errors would be
> 
>  > great :-) Like simply outputting a warning if frequency scaling is
> 
>  > detected).
> 
> Sorry, I don't get this one.


I had some trouble getting reliable benchmarks on my own computer until 
I realised that the power-saving capabilities of my CPU down-throttled 
the clock speed when it was not in use. Thus if I did two calls to 
timeit right after one another, the second would always report lower 
runtime, because the first one started at a lower clock speed.

Changing a BIOS setting solved this, but it might be a gotcha which e.g. 
timeit and IPython could report (they could just inspect the CPU 
information and emit a warning -- or, do something to throttle up the 
CPU to full speed first).

-- 
Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a 2D with a 1D array...

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 13:45:10 Dag Sverre Seljebotn escrigué:
> Francesc Alted wrote:
> > A Wednesday 09 September 2009 20:17:20 Dag Sverre Seljebotn escrigué:
> >  > Ruben Salvador wrote:
> >  > > Your results are what I expected...but. This code is called from my
> >
> > main
> >
> >  > > program, and what I have in there (output array already created for
> >
> > both
> >
> >  > > cases) is:
> >  > >
> >  > >
> >  > >
> >  > > print "lambd", lambd
> >  > >
> >  > > print "np.shape(a)", np.shape(a)
> >  > >
> >  > > print "np.shape(r)", np.shape(r)
> >  > >
> >  > > print "np.shape(offspr)", np.shape(offspr)
> >  > >
> >  > > t = clock()
> >  > >
> >  > > for i in range(lambd):
> >  > >
> >  > > offspr[i] = r[i] + a[i]
> >  > >
> >  > > t1 = clock() - t
> >  > >
> >  > > print "For loop time ==> %.8f seconds" % t1
> >  > >
> >  > > t2 = clock()
> >  > >
> >  > > offspr = r + a[:,None]
> >  > >
> >  > > t3 = clock() - t2
> >  > >
> >  > > print "Pythonic time ==> %.8f seconds" % t3
> >  > >
> >  > >
> >  > >
> >  > > The results I obtain are:
> >  > >
> >  > >
> >  > >
> >  > > lambd 8
> >  > >
> >  > > np.shape(a) (8,)
> >  > >
> >  > > np.shape(r) (8, 26)
> >  > >
> >  > > np.shape(offspr) (8, 26)
> >  > >
> >  > > For loop time ==> 0.34528804 seconds
> >  > >
> >  > > Pythonic time ==> 0.35956192 seconds
> >  > >
> >  > >
> >  > >
> >  > > Maybe I'm not measuring properly, so, how should I do it?
> >  >
> >  > Like Luca said, you are not including the creation time of offspr in
> >  > the
> >  >
> >  > for-loop version. A fairer comparison would be
> >  >
> >  >
> >  >
> >  > offspr[...] = r + a[:, None]
> >  >
> >  >
> >  >
> >  > Even fairer (one less temporary copy):
> >  >
> >  >
> >  >
> >  > offspr[...] = r
> >  >
> >  > offspr += a[:, None]
> >  >
> >  >
> >  >
> >  > Of course, see how the trend is for larger N as well.
> >  >
> >  >
> >  >
> >  > Also your timings are a bit crude (though this depends on how many
> >  > times
> >  >
> >  > you ran your script to check :-)). To get better measurements, use the
> >  >
> >  > timeit module, or (easier) IPython and the %timeit command.
> >
> > Oh well, the art of benchmarking :)
> >
> > The timeit module allows you normally get less jitter in timings because
> > it loops on doing the same operation repeatedly and get a mean. However,
> > this has the drawback of filling your cache with the datasets (or part
> > of them) so, in the end, your measurements with timeit does not take
> > into account the time to transmit the data in main memory into the CPU
> > caches, and that may be not what you want to measure.
>
> Do you see any issues with this approach: Add a flag timeit to provide
> two modes:
>
> a) Do an initial run which is always not included in timings (in fact,
> as it gets "min" and not "mean", I think this is the current behaviour)

Yup, you are right, it is 'min'.  In fact, this is why timeit normally 
'forgets' about data transmission times (with a 'mean' the effect is very 
similar anyways).

> b) Do something else between every run which should clear out the cache
> (like, just do another big dummy calculation).

Yeah.  In fact, you can simulate this behaviour by running two instances of 
timeit: one with your code + big dummy calculation, and the other with just 
the big dummy calculation.  Subtract both numbers and you will have a better 
guess for non-cached calculations.

>
> (Also a guard in timeit against CPU frequency scaling errors would be
> great :-) Like simply outputting a warning if frequency scaling is
> detected).

Sorry, I don't get this one.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a 2D with a 1D array...

2009-09-10 Thread Dag Sverre Seljebotn
Francesc Alted wrote:
> A Wednesday 09 September 2009 20:17:20 Dag Sverre Seljebotn escrigué:
> 
>  > Ruben Salvador wrote:
> 
>  > > Your results are what I expected...but. This code is called from my 
> main
> 
>  > > program, and what I have in there (output array already created for 
> both
> 
>  > > cases) is:
> 
>  > >
> 
>  > > print "lambd", lambd
> 
>  > > print "np.shape(a)", np.shape(a)
> 
>  > > print "np.shape(r)", np.shape(r)
> 
>  > > print "np.shape(offspr)", np.shape(offspr)
> 
>  > > t = clock()
> 
>  > > for i in range(lambd):
> 
>  > > offspr[i] = r[i] + a[i]
> 
>  > > t1 = clock() - t
> 
>  > > print "For loop time ==> %.8f seconds" % t1
> 
>  > > t2 = clock()
> 
>  > > offspr = r + a[:,None]
> 
>  > > t3 = clock() - t2
> 
>  > > print "Pythonic time ==> %.8f seconds" % t3
> 
>  > >
> 
>  > > The results I obtain are:
> 
>  > >
> 
>  > > lambd 8
> 
>  > > np.shape(a) (8,)
> 
>  > > np.shape(r) (8, 26)
> 
>  > > np.shape(offspr) (8, 26)
> 
>  > > For loop time ==> 0.34528804 seconds
> 
>  > > Pythonic time ==> 0.35956192 seconds
> 
>  > >
> 
>  > > Maybe I'm not measuring properly, so, how should I do it?
> 
>  >
> 
>  > Like Luca said, you are not including the creation time of offspr in the
> 
>  > for-loop version. A fairer comparison would be
> 
>  >
> 
>  > offspr[...] = r + a[:, None]
> 
>  >
> 
>  > Even fairer (one less temporary copy):
> 
>  >
> 
>  > offspr[...] = r
> 
>  > offspr += a[:, None]
> 
>  >
> 
>  > Of course, see how the trend is for larger N as well.
> 
>  >
> 
>  > Also your timings are a bit crude (though this depends on how many times
> 
>  > you ran your script to check :-)). To get better measurements, use the
> 
>  > timeit module, or (easier) IPython and the %timeit command.
> 
> Oh well, the art of benchmarking :)
> 
> The timeit module allows you normally get less jitter in timings because 
> it loops on doing the same operation repeatedly and get a mean. However, 
> this has the drawback of filling your cache with the datasets (or part 
> of them) so, in the end, your measurements with timeit does not take 
> into account the time to transmit the data in main memory into the CPU 
> caches, and that may be not what you want to measure.

Do you see any issues with this approach: Add a flag timeit to provide 
two modes:

a) Do an initial run which is always not included in timings (in fact, 
as it gets "min" and not "mean", I think this is the current behaviour)

b) Do something else between every run which should clear out the cache 
(like, just do another big dummy calculation).

(Also a guard in timeit against CPU frequency scaling errors would be 
great :-) Like simply outputting a warning if frequency scaling is 
detected).

-- 
Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a 2D with a 1D array...

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:43:44 Ruben Salvador escrigué:
> OK. Thanks everybody :D
> But...what is happening now? When executing this code:
>
> print ' .  object parameters mutation .'
> print 'np.shape(offspr)', np.shape(offspr)
> print 'np.shape(offspr[0])', np.shape(offspr[0])
> print "np.shape(r)", np.shape(r)
> print "np.shape(offspr_sigma)", np.shape(offspr_sigma)
> a = offspr_sigma * np.random.normal(0, 1, shp_sigma)
> print "np.shape(a)", np.shape(a)
> t4 = clock()
> offspr[...] = r
> offspr += a[:,None]
> t5 = clock() - t4
> print "Pythonic time (no array creation) ==> %.8f seconds" % t5
> t2 = clock()
> offspr = r + a[:,None]
> t3 = clock() - t2
> print "Pythonic time ==> %.8f seconds" % t3
> t = clock()
> for i in range(lambd):
> offspr[i] = r[i] + a[i]
> t1 = clock() - t
> print "For loop time ==> %.8f seconds" % t1
>
> what I get is
[clip]

What's your definition for offspr?  Please always try to send auto-contained 
code snippets so that other people can better help you.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> The point is: are GPUs prepared to compete with a general-purpose CPUs in
> all-road operations, like evaluating transcendental functions, conditionals
> all of this with a rich set of data types?
Yup.

-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a 2D with a 1D array...

2009-09-10 Thread Ruben Salvador
OK. Thanks everybody :D
But...what is happening now? When executing this code:

print ' .  object parameters mutation .'
print 'np.shape(offspr)', np.shape(offspr)
print 'np.shape(offspr[0])', np.shape(offspr[0])
print "np.shape(r)", np.shape(r)
print "np.shape(offspr_sigma)", np.shape(offspr_sigma)
a = offspr_sigma * np.random.normal(0, 1, shp_sigma)
print "np.shape(a)", np.shape(a)
t4 = clock()
offspr[...] = r
offspr += a[:,None]
t5 = clock() - t4
print "Pythonic time (no array creation) ==> %.8f seconds" % t5
t2 = clock()
offspr = r + a[:,None]
t3 = clock() - t2
print "Pythonic time ==> %.8f seconds" % t3
t = clock()
for i in range(lambd):
offspr[i] = r[i] + a[i]
t1 = clock() - t
print "For loop time ==> %.8f seconds" % t1

what I get is

 .  object parameters mutation .
np.shape(offspr) (8, 26)
np.shape(offspr[0]) (26,)
np.shape(r) (8, 26)
np.shape(offspr_sigma) (8,)
np.shape(a) (8,)
Traceback (most recent call last):
  File "/home/rsalvador/wavelets/devel/testing/genwave.py", line 660, in

main()
  File "/home/rsalvador/wavelets/devel/testing/genwave.py", line 390, in
main
mutate_strat, tau_global, tau_params)
  File "/home/rsalvador/wavelets/devel/testing/genwavelib.py", line 299, in
mutate
offspr[...] = r
TypeError: list indices must be integers

WTF?

On 9/10/09, Francesc Alted  wrote:
>
>  A Wednesday 09 September 2009 20:17:20 Dag Sverre Seljebotn escrigué:
>
> > Ruben Salvador wrote:
>
> > > Your results are what I expected...but. This code is called from my
> main
>
> > > program, and what I have in there (output array already created for
> both
>
> > > cases) is:
>
> > >
>
> > > print "lambd", lambd
>
> > > print "np.shape(a)", np.shape(a)
>
> > > print "np.shape(r)", np.shape(r)
>
> > > print "np.shape(offspr)", np.shape(offspr)
>
> > > t = clock()
>
> > > for i in range(lambd):
>
> > > offspr[i] = r[i] + a[i]
>
> > > t1 = clock() - t
>
> > > print "For loop time ==> %.8f seconds" % t1
>
> > > t2 = clock()
>
> > > offspr = r + a[:,None]
>
> > > t3 = clock() - t2
>
> > > print "Pythonic time ==> %.8f seconds" % t3
>
> > >
>
> > > The results I obtain are:
>
> > >
>
> > > lambd 8
>
> > > np.shape(a) (8,)
>
> > > np.shape(r) (8, 26)
>
> > > np.shape(offspr) (8, 26)
>
> > > For loop time ==> 0.34528804 seconds
>
> > > Pythonic time ==> 0.35956192 seconds
>
> > >
>
> > > Maybe I'm not measuring properly, so, how should I do it?
>
> >
>
> > Like Luca said, you are not including the creation time of offspr in the
>
> > for-loop version. A fairer comparison would be
>
> >
>
> > offspr[...] = r + a[:, None]
>
> >
>
> > Even fairer (one less temporary copy):
>
> >
>
> > offspr[...] = r
>
> > offspr += a[:, None]
>
> >
>
> > Of course, see how the trend is for larger N as well.
>
> >
>
> > Also your timings are a bit crude (though this depends on how many times
>
> > you ran your script to check :-)). To get better measurements, use the
>
> > timeit module, or (easier) IPython and the %timeit command.
>
> Oh well, the art of benchmarking :)
>
> The timeit module allows you normally get less jitter in timings because it
> loops on doing the same operation repeatedly and get a mean. However, this
> has the drawback of filling your cache with the datasets (or part of them)
> so, in the end, your measurements with timeit does not take into account the
> time to transmit the data in main memory into the CPU caches, and that may
> be not what you want to measure.
>
> In the case of Ruben, I think what he is seeing are cache effects. Maybe if
> he does a loop, he would finally see the difference coming up (although this
> may be not what he want, of course ;-)
>
> --
>
> Francesc Alted
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Matthieu Brucher
> Sure. Specially because NumPy is all about embarrasingly parallel problems
> (after all, this is how an ufunc works, doing operations
> element-by-element).
>
> The point is: are GPUs prepared to compete with a general-purpose CPUs in
> all-road operations, like evaluating transcendental functions, conditionals
> all of this with a rich set of data types? I would like to believe that this
> is the case, but I don't think so (at least not yet).

A lot of nVidia's SDK functions is not done on GPU. There are some
functions that they provide where the actual computation is done on
the CPU, not on the GPU (I don't have an example here, but nVidia's
forum is full of examples ;))

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Sturla Molden
Francesc Alted skrev:
>
> Numexpr already uses the Python parser, instead of build a new one. 
> However the bytecode emitted after the compilation process is 
> different, of course.
>
> Also, I don't see the point in requiring immutable buffers. Could you 
> develop this further?
>
If you do lacy evaluation, a function like this could fail without 
immutable buffers:

def foobar(x):
y = a*x[:] + b
x[0] = 0 # affects y and anything else depending on x
return y

Immutable buffers are not required, one could document the oddity, but 
coding would be very error-prone.


S.M.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Gael Varoquaux
On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote:
>The point is: are GPUs prepared to compete with a general-purpose CPUs in
>all-road operations, like evaluating transcendental functions,
>conditionals all of this with a rich set of data types? I would like to
>believe that this is the case, but I don't think so (at least not yet).

I believe (this is very foggy) that GPUs can implement non trivial logic
on there base processing unit, so that conditionals and transcendental
functions are indeed possible. Where it gets hard is when you don't have
problems that can be expressed in an embarrassingly parallel manner.
There are solutions there to (I believe of the message passing type),
after all matrix multiplication is done on GPUs.

Ga�l
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:20:21 Gael Varoquaux escrigué:
> On Thu, Sep 10, 2009 at 10:36:27AM +0200, Francesc Alted wrote:
> >Where are you getting this info from? IMO the technology of memory in
> >graphics boards cannot be so different than in commercial
> > motherboards. It could be a *bit* faster (at the expenses of packing less
> > of it), but I'd say not as much as 4x faster (100 GB/s vs 25 GB/s of
> > Intel i7 in sequential access), as you are suggesting. Maybe this is GPU
> > cache bandwidth?
>
> I believe this is simply because the transfers is made in parallel to the
> different processing units of the graphic card. So we are back to
> importance of embarrassingly parallel problems and specifying things with
> high-level operations rather than for loop.

Sure.  Specially because NumPy is all about embarrasingly parallel problems 
(after all, this is how an ufunc works, doing operations element-by-element).
The point is: are GPUs prepared to compete with a general-purpose CPUs in all-
road operations, like evaluating transcendental functions, conditionals all of 
this with a rich set of data types?  I would like to believe that this is the 
case, but I don't think so (at least not yet).

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 10:58:13 Rohit Garg escrigué:
> > Where are you getting this info from? IMO the technology of memory in
> > graphics boards cannot be so different than in commercial motherboards.
> > It could be a *bit* faster (at the expenses of packing less of it), but
> > I'd say not as much as 4x faster (100 GB/s vs 25 GB/s of Intel i7 in
> > sequential access), as you are suggesting. Maybe this is GPU cache
> > bandwidth?
>
> This is publicly documented. You can start off by looking at the
> wikipedia stuff.
>
> For reference,
>
> gtx280-->141GBps-->has 1GB
> ati4870-->115GBps-->has 1GB
> ati5870-->153GBps (launches sept 22, 2009)-->2GB models will be there too
>
> Next gen nv gpu's will *assuredly* have bandwidth in excess of 200 GBps.
>
> This is *off chip memory bandwidth* from graphics memory (aka video
> ram). GPU have (very small) caches but they don't reduce memory
> latency.

That's nice to see.  I think I'll change my mind if someone could perform a 
vector-vector multiplication (a operation that is typically memory-bounded) in 
double precision up to 5x times faster on a gtx280 nv card than in a Intel's 
i7 CPU.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Gael Varoquaux
On Thu, Sep 10, 2009 at 10:36:27AM +0200, Francesc Alted wrote:
>Where are you getting this info from? IMO the technology of memory in
>graphics boards cannot be so different than in commercial motherboards. It
>could be a *bit* faster (at the expenses of packing less of it), but I'd
>say not as much as 4x faster (100 GB/s vs 25 GB/s of Intel i7 in
>sequential access), as you are suggesting. Maybe this is GPU cache
>bandwidth?

I believe this is simply because the transfers is made in parallel to the
different processing units of the graphic card. So we are back to
importance of embarrassingly parallel problems and specifying things with
high-level operations rather than for loop.

Ga�l
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:11:22 Sturla Molden escrigué:
> Citi, Luca skrev:
> > That is exactly why numexpr is faster in these cases.
> > I hope one day numpy will be able to perform such
> > optimizations.
>
> I think it is going to require lazy evaluation. Whenever possible, an
> operator would just return a symbolic representation of the operation.
> This would gradually build up a tree of operators and buffers. When
> someone tries to read the data from an array, the buffer is created
> on-demand by flushing procratinated expressions. One must be sure that
> the buffers referenced in an incomplete expression never change. This
> would be easiest to ensure with immutable buffers.  Numexpr is the kind
> of  back-end a system like this would require.  But a lot of the code in
> numexpr can be omitted because Python creates the parse tree; we would
> not need the expression parser in numexpr as frontend. Well... this plan
> is gradually getting closer to a specialized SciPy JIT-compiler. I would
> be fun to make if I could find time for it.

Numexpr already uses the Python parser, instead of build a new one.  However 
the bytecode emitted after the compilation process is different, of course.

Also, I don't see the point in requiring immutable buffers.  Could you develop 
this further?

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a 2D with a 1D array...

2009-09-10 Thread Ruben Salvador
OK. I get the idea, but I can't see it. In both cases, as the print
statement shows, offspr is already created.

I need light :S

On Wed, Sep 9, 2009 at 8:17 PM, Dag Sverre Seljebotn <
da...@student.matnat.uio.no> wrote:

> Ruben Salvador wrote:
> > Your results are what I expected...but. This code is called from my main
> > program, and what I have in there (output array already created for both
> > cases) is:
> >
> > print "lambd", lambd
> > print "np.shape(a)", np.shape(a)
> > print "np.shape(r)", np.shape(r)
> > print "np.shape(offspr)", np.shape(offspr)
> > t = clock()
> > for i in range(lambd):
> > offspr[i] = r[i] + a[i]
> > t1 = clock() - t
> > print "For loop time ==> %.8f seconds" % t1
> > t2 = clock()
> > offspr = r + a[:,None]
> > t3 = clock() - t2
> > print "Pythonic time ==> %.8f seconds" % t3
> >
> > The results I obtain are:
> >
> > lambd 8
> > np.shape(a) (8,)
> > np.shape(r) (8, 26)
> > np.shape(offspr) (8, 26)
> > For loop time ==> 0.34528804 seconds
> > Pythonic time ==> 0.35956192 seconds
> >
> > Maybe I'm not measuring properly, so, how should I do it?
>
> Like Luca said, you are not including the creation time of offspr in the
> for-loop version. A fairer comparison would be

offspr[...] = r + a[:, None]
>
> Even fairer (one less temporary copy):
>
> offspr[...] = r
> offspr += a[:, None]
>
> Of course, see how the trend is for larger N as well.
>
> Also your timings are a bit crude (though this depends on how many times
> you ran your script to check :-)). To get better measurements, use the
> timeit module, or (easier) IPython and the %timeit command.
>
> >
> > On Wed, Sep 9, 2009 at 1:20 PM, Citi, Luca  > > wrote:
> >
> > I am sorry but it doesn't make much sense.
> > How do you measure the performance?
> > Are you sure you include the creation of the "c" output array in the
> > time spent (which is outside the for loop but should be considered
> > anyway)?
> >
> > Here are my results...
> >
> > In [84]: a = np.random.rand(8,26)
> >
> > In [85]: b = np.random.rand(8)
> >
> > In [86]: def o(a,b):
> >   : c = np.empty_like(a)
> >   : for i in range(len(a)):
> >   : c[i] = a[i] + b[i]
> >   : return c
> >   :
> >
> > In [87]: d = a + b[:,None]
> >
> > In [88]: (d == o(a,b)).all()
> > Out[88]: True
> >
> > In [89]: %timeit o(a,b)
> > %ti1 loops, best of 3: 36.8 µs per loop
> >
> > In [90]: %timeit d = a + b[:,None]
> > 10 loops, best of 3: 5.17 µs per loop
> >
> > In [91]: a = np.random.rand(8,26)
> >
> > In [92]: b = np.random.rand(8)
> >
> > In [93]: %timeit o(a,b)
> > %ti10 loops, best of 3: 287 ms per loop
> >
> > In [94]: %timeit d = a + b[:,None]
> > 100 loops, best of 3: 15.4 ms per loop
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org 
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > 
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> --
> Dag Sverre
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Sturla Molden
Rohit Garg skrev:
> gtx280-->141GBps-->has 1GB
> ati4870-->115GBps-->has 1GB
> ati5870-->153GBps (launches sept 22, 2009)-->2GB models will be there too
>   
That is going to help if buffers are kept in graphics memory. But the 
problem is that graphics memory is a scarse resource.

S.M.






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Sturla Molden
Citi, Luca skrev:
> That is exactly why numexpr is faster in these cases.
> I hope one day numpy will be able to perform such
> optimizations.
>   
I think it is going to require lazy evaluation. Whenever possible, an 
operator would just return a symbolic representation of the operation. 
This would gradually build up a tree of operators and buffers. When 
someone tries to read the data from an array, the buffer is created 
on-demand by flushing procratinated expressions. One must be sure that 
the buffers referenced in an incomplete expression never change. This 
would be easiest to ensure with immutable buffers.  Numexpr is the kind 
of  back-end a system like this would require.  But a lot of the code in 
numexpr can be omitted because Python creates the parse tree; we would 
not need the expression parser in numexpr as frontend. Well... this plan 
is gradually getting closer to a specialized SciPy JIT-compiler. I would 
be fun to make if I could find time for it.

Sturla Molden


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> Where are you getting this info from? IMO the technology of memory in
> graphics boards cannot be so different than in commercial motherboards. It
> could be a *bit* faster (at the expenses of packing less of it), but I'd say
> not as much as 4x faster (100 GB/s vs 25 GB/s of Intel i7 in sequential
> access), as you are suggesting. Maybe this is GPU cache bandwidth?

This is publicly documented. You can start off by looking at the
wikipedia stuff.

For reference,

gtx280-->141GBps-->has 1GB
ati4870-->115GBps-->has 1GB
ati5870-->153GBps (launches sept 22, 2009)-->2GB models will be there too

Next gen nv gpu's will *assuredly* have bandwidth in excess of 200 GBps.

This is *off chip memory bandwidth* from graphics memory (aka video
ram). GPU have (very small) caches but they don't reduce memory
latency.

>
> --
>
> Francesc Alted
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a 2D with a 1D array...

2009-09-10 Thread Francesc Alted
A Wednesday 09 September 2009 20:17:20 Dag Sverre Seljebotn escrigué:
> Ruben Salvador wrote:
> > Your results are what I expected...but. This code is called from my main
> > program, and what I have in there (output array already created for both
> > cases) is:
> >
> > print "lambd", lambd
> > print "np.shape(a)", np.shape(a)
> > print "np.shape(r)", np.shape(r)
> > print "np.shape(offspr)", np.shape(offspr)
> > t = clock()
> > for i in range(lambd):
> > offspr[i] = r[i] + a[i]
> > t1 = clock() - t
> > print "For loop time ==> %.8f seconds" % t1
> > t2 = clock()
> > offspr = r + a[:,None]
> > t3 = clock() - t2
> > print "Pythonic time ==> %.8f seconds" % t3
> >
> > The results I obtain are:
> >
> > lambd 8
> > np.shape(a) (8,)
> > np.shape(r) (8, 26)
> > np.shape(offspr) (8, 26)
> > For loop time ==> 0.34528804 seconds
> > Pythonic time ==> 0.35956192 seconds
> >
> > Maybe I'm not measuring properly, so, how should I do it?
>
> Like Luca said, you are not including the creation time of offspr in the
> for-loop version. A fairer comparison would be
>
> offspr[...] = r + a[:, None]
>
> Even fairer (one less temporary copy):
>
> offspr[...] = r
> offspr += a[:, None]
>
> Of course, see how the trend is for larger N as well.
>
> Also your timings are a bit crude (though this depends on how many times
> you ran your script to check :-)). To get better measurements, use the
> timeit module, or (easier) IPython and the %timeit command.

Oh well, the art of benchmarking :)

The timeit module allows you normally get less jitter in timings because it 
loops on doing the same operation repeatedly and get a mean.  However, this 
has the drawback of filling your cache with the datasets (or part of them) so, 
in the end, your measurements with timeit does not take into account the time 
to transmit the data in main memory into the CPU caches, and that may be not 
what you want to measure.

In the case of Ruben, I think what he is seeing are cache effects.  Maybe if 
he does a loop, he would finally see the difference coming up (although this 
may be not what he want, of course ;-)

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a 2D with a 1D array...

2009-09-10 Thread Citi, Luca
Hi Ruben,

> In both cases, as the print
> statement shows, offspr is already created.

>>> offspr[...] = r + a[:, None]
means "fill the existing object pointed by offspr with r + a[:, None]" while
>>> offspr = r + a[:,None]
means "create a new array and assign it to the variable offspr (after 
decref-ing the object previously pointed by offspr)"

Best,
Luca

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Citi, Luca
Hi Sturla,

> The proper way to speed up "dot(a*b+c*sqrt(d), e)" is to get rid of 
> temporary intermediates.
I implemented a patch 
http://projects.scipy.org/numpy/ticket/1153
that reduces the number of temporary intermediates.
In your example from 4 to 2.
There is a big improvement in terms of memory footprint,
and some improvement in terms of speed (especially for
large matrices) but not as much as I expected.

In your example
> result = 0
> for i in range(n):
> result += (a[i]*b[i] + c[i]*sqrt(d[i])) * e[i]
another big speedup could come from the fact that it
makes better use of the cache.

That is exactly why numexpr is faster in these cases.
I hope one day numpy will be able to perform such
optimizations.

Best,
Luca
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 09:45:29 Rohit Garg escrigué:
> > You do realize that the throughput from onboard (video) RAM is going
> > to be much higher, right? It's not just the parallelization but the
> > memory bandwidth. And as James pointed out, if you can keep most of
> > your intermediate computation on-card, you stand to benefit immensely,
> > even if doing some operations where the GPU provides no tangible
> > benefit (i.e. the benefit is in aggregate and avoiding copies).
>
> Good point made here. GPU's support bandwidth O(100 GBps) (bytes not
> bits). Upcoming GPU's will likely break the 250 GBps mark. Even if
> your expressions involve low operation/memory ratios, GPU's are a big
> win as their memory bandwidth ishigher than CPU's L2 and even L1
> caches.

Where are you getting this info from?  IMO the technology of memory in 
graphics boards cannot be so different than in commercial motherboards.  It 
could be a *bit* faster (at the expenses of packing less of it), but I'd say 
not as much as 4x faster (100 GB/s vs 25 GB/s of Intel i7 in sequential 
access), as you are suggesting.  Maybe this is GPU cache bandwidth?

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> You do realize that the throughput from onboard (video) RAM is going
> to be much higher, right? It's not just the parallelization but the
> memory bandwidth. And as James pointed out, if you can keep most of
> your intermediate computation on-card, you stand to benefit immensely,
> even if doing some operations where the GPU provides no tangible
> benefit (i.e. the benefit is in aggregate and avoiding copies).

Good point made here. GPU's support bandwidth O(100 GBps) (bytes not
bits). Upcoming GPU's will likely break the 250 GBps mark. Even if
your expressions involve low operation/memory ratios, GPU's are a big
win as their memory bandwidth ishigher than CPU's L2 and even L1
caches.

Regards,

-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-10 Thread David Cournapeau
Kim Hansen wrote:
>
> On 9-Sep-09, at 4:48 AM, Francesc Alted wrote:
>
> > Yes, this later is supported in PyTables as long as the underlying
> > filesystem
> > supports files > 2 GB, which is very usual in modern operating
> > systems.
>
> I think the OP said he was on Win32, in which case it should be noted:
> FAT32 has its upper file size limit at 4GB (minus one byte), so
> storing both your arrays as one file on a FAT32 partition is a no-no.
>
> David
>
>  
> Strange, I work on Win32 systems, and there I have no problems storing
> data files up to 600 GB (have not tried larger) in size stored on
> RAID0 disk systems of 2x1TB, I can also open them and seek in them
> using Python.

It is a FAT32 limitation, not a windows limitation. NTFS should handle
large files without much trouble, and I believe the vast majority of
windows installations (>= windows xp) use NTFS and not FAT32. I
certainly have not seen a windows installed on FAT32 for a very long time.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-10 Thread Kim Hansen
>
> On 9-Sep-09, at 4:48 AM, Francesc Alted wrote:
>
> > Yes, this later is supported in PyTables as long as the underlying
> > filesystem
> > supports files > 2 GB, which is very usual in modern operating
> > systems.
>
> I think the OP said he was on Win32, in which case it should be noted:
> FAT32 has its upper file size limit at 4GB (minus one byte), so
> storing both your arrays as one file on a FAT32 partition is a no-no.
>
> David
>

Strange, I work on Win32 systems, and there I have no problems storing data
files up to 600 GB (have not tried larger) in size stored on RAID0 disk
systems of 2x1TB, I can also open them and seek in them using Python. For
those data files, I use Pytables lzo compressed h5 files to create and
maintain an index to the large data file Besides some meta data describing
chunks of data, the index also conains a data position value stating what
the file position of the beginning of each data chunk (payload) is. The
index files I work with in h5 format are not larger than 1.5 GB though.

It all works very nice and it is very convenient

Kim
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] error: comma at end of enumerator list

2009-09-10 Thread Mads Ipsen
Hey,

When I try to compile a swig based interface to NumPy, I get the error:

  lib/python2.6/site-packages/numpy/core/include/numpy/npy_common.h:11: 
error: comma at end of enumerator list

In npy_common.h, changing

/* enums for detected endianness */
enum {
NPY_CPU_UNKNOWN_ENDIAN,
NPY_CPU_LITTLE,
NPY_CPU_BIG,
};


to

/* enums for detected endianness */
enum {
NPY_CPU_UNKNOWN_ENDIAN,
NPY_CPU_LITTLE,
NPY_CPU_BIG
};

fixes the issue. I believe this should be fixed. At least we cannot 
built our software without the above fix.

System info:

gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu12)
Ubuntu 8.10

Best regards,

Mads

-- 
++
| Mads Ipsen, Scientific developer   |
+--+-+
| QuantumWise A/S  | phone: +45-29716388 |
| Nørresøgade 27A  | www:www.quantumwise.com |
| DK-1370 Copenhagen, Denmark  | email:  m...@quantumwise.com |
+--+-+


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion