Re: [Numpy-discussion] NumPy re-factoring project

2010-06-15 Thread Sturla Molden
Den 15.06.2010 18:30, skrev Sturla Molden: > A very radical solution would be to get rid of all C, and go for a > "pure Python" solution. NumPy could build up a text string with OpenCL > code on the fly, and use the OpenCL driver as a "JIT compiler" for > fast array expressions. Most GPUs and CP

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-15 Thread Sturla Molden
A very radical solution would be to get rid of all C, and go for a "pure Python" solution. NumPy could build up a text string with OpenCL code on the fly, and use the OpenCL driver as a "JIT compiler" for fast array expressions. Most GPUs and CPUs will support OpenCL, and thus there will be no

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-13 Thread Pauli Virtanen
Sun, 13 Jun 2010 06:54:29 +0200, Sturla Molden wrote: [clip: memory management only in the interface] You forgot views: if memory management is done in the interface layer, it must also make sure that the memory pointed to by a view is never moved around, and not freed before all the views are f

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Sturla Molden
Den 13.06.2010 05:47, skrev David Cournapeau: > > This only works in simple cases. What do you do when you don't know > the output size ? First: If you don't know, you don't know. Then you're screwed and C is not going to help. Second: If we cannot figure out how much to allocate before starting

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread David Cournapeau
On Sun, Jun 13, 2010 at 11:39 AM, Sturla Molden wrote: > If NumPy does not allocate memory on it's own, there will be no leaks > due to errors in NumPy. > > There is still work to do in the core, i.e. the computational loops in > array operators, broadcasting, ufuncs, copying data between buffers

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Sturla Molden
Den 13.06.2010 02:39, skrev David Cournapeau: > > But the point is to get rid of the python dependency, and if you don't > allow any api call to allocate memory, there is not much left to > implement in the core. > > Memory allocation is platform dependent. A CPython version could use bytearr

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread David Cournapeau
On Sun, Jun 13, 2010 at 2:00 AM, Sturla Molden wrote: > Den 12.06.2010 15:57, skrev David Cournapeau: >> Anything non trivial will require memory allocation and object >> ownership conventions. If the goal is interoperation with other >> languages and vm, you may want to use something else than pl

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Charles R Harris
On Sat, Jun 12, 2010 at 2:56 PM, Dag Sverre Seljebotn < da...@student.matnat.uio.no> wrote: > Charles Harris wrote: > > On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn < > > da...@student.matnat.uio.no> wrote: > > > >> Christopher Barker wrote: > >> > David Cournapeau wrote: > >> >>> In the

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Dag Sverre Seljebotn
Charles Harris wrote: > On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn < > da...@student.matnat.uio.no> wrote: > >> Christopher Barker wrote: >> > David Cournapeau wrote: >> >>> In the core C numpy library there would be new "numpy_array" struct >> >>> with attributes >> >>> >> >>> numpy_

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Sebastian Walter
On Sat, Jun 12, 2010 at 3:57 PM, David Cournapeau wrote: > On Sat, Jun 12, 2010 at 10:27 PM, Sebastian Walter > wrote: >> On Thu, Jun 10, 2010 at 6:48 PM, Sturla Molden wrote: >>> >>> I have a few radical suggestions: >>> >>> 1. Use ctypes as glue to the core DLL, so we can completely forget abo

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Benjamin Root
If I could, I would like to throw out another possible feature that might need to be taken into consideration for designing the implementation of numpy arrays. One thing I found somewhat lacking -- if that is the right term -- is a way to convolve a numpy array with an arbitrary windowing element.

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Francesc Alted
2010/6/12 Charles R Harris > > This is more the way I see things, except I would divide the bottom layer > into two parts, views and memory. The memory can come from many places -- > memmaps, user supplied buffers, etc. -- but we should provide a simple > reference counted allocator for the defau

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Charles R Harris
On Sat, Jun 12, 2010 at 1:35 PM, Charles R Harris wrote: > > > On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn < > da...@student.matnat.uio.no> wrote: > >> Christopher Barker wrote: >> > David Cournapeau wrote: >> >>> In the core C numpy library there would be new "numpy_array" struct >>

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Charles R Harris
On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn < da...@student.matnat.uio.no> wrote: > Christopher Barker wrote: > > David Cournapeau wrote: > >>> In the core C numpy library there would be new "numpy_array" struct > >>> with attributes > >>> > >>> numpy_array->buffer > > > >> Anything n

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Dag Sverre Seljebotn
Christopher Barker wrote: > David Cournapeau wrote: >>> In the core C numpy library there would be new "numpy_array" struct >>> with attributes >>> >>> numpy_array->buffer > >> Anything non trivial will require memory allocation and object >> ownership conventions. > > I totally agree -- I've bee

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Christopher Barker
David Cournapeau wrote: >> In the core C numpy library there would be new "numpy_array" struct >> with attributes >> >> numpy_array->buffer > Anything non trivial will require memory allocation and object > ownership conventions. I totally agree -- I've been thinking for a while about a core ar

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Sturla Molden
Den 12.06.2010 15:57, skrev David Cournapeau: > Anything non trivial will require memory allocation and object > ownership conventions. If the goal is interoperation with other > languages and vm, you may want to use something else than plain > malloc, to interact better with the allocation strateg

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread David Cournapeau
On Sat, Jun 12, 2010 at 10:27 PM, Sebastian Walter wrote: > On Thu, Jun 10, 2010 at 6:48 PM, Sturla Molden wrote: >> >> I have a few radical suggestions: >> >> 1. Use ctypes as glue to the core DLL, so we can completely forget about >> refcounts and similar mess. Why put manual reference counting

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-12 Thread Sebastian Walter
On Thu, Jun 10, 2010 at 6:48 PM, Sturla Molden wrote: > > I have a few radical suggestions: > > 1. Use ctypes as glue to the core DLL, so we can completely forget about > refcounts and similar mess. Why put manual reference counting and error > handling in the core? It's stupid. I totally agree,

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Pauli Virtanen
Fri, 11 Jun 2010 15:31:45 +0200, Sturla Molden wrote: [clip] >> The innermost dimension is handled via the ufunc loop, which is a >> simple for loop with constant-size step and is given a number of >> iterations. The array iterator objects are used only for stepping >> through the outer dimensions.

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Sturla Molden
Den 11.06.2010 17:17, skrev Anne Archibald: > > On the other hand, since memory reads are very slow, optimizations > that do more calculation per load/store could make a very big > difference, eliminating temporaries as a side effect. > Yes, that's the main issue, not the extra memory they use

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Dag Sverre Seljebotn
Sturla Molden wrote: > Den 11.06.2010 09:14, skrev Sebastien Binet: >> it of course depends on the granularity at which you wrap and use >> numpy-core but tight loops calling ctypes ain't gonna be pretty >> performance-wise. >> > > Tight loops in Python are never pretty. > > The purpose of ve

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Anne Archibald
On 11 June 2010 11:12, Benjamin Root wrote: > > > On Fri, Jun 11, 2010 at 8:31 AM, Sturla Molden wrote: >> >> >> It would also make sence to evaluate expressions like "y = b*x + a" >> without a temporary array for b*x. I know roughly how to do it, but >> don't have time to look at it before next

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Benjamin Root
On Fri, Jun 11, 2010 at 8:31 AM, Sturla Molden wrote: > > > It would also make sence to evaluate expressions like "y = b*x + a" > without a temporary array for b*x. I know roughly how to do it, but > don't have time to look at it before next year. (Yes I know about > numexpr, I am talking about p

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Sturla Molden
Den 11.06.2010 09:14, skrev Sebastien Binet: > it of course depends on the granularity at which you wrap and use > numpy-core but tight loops calling ctypes ain't gonna be pretty > performance-wise. > Tight loops in Python are never pretty. The purpose of vectorization with NumPy is to avoid

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Sturla Molden
Den 11.06.2010 10:17, skrev Pauli Virtanen: >> 1. Collect an array of pointers to each subarray (e.g. using >> std::vector or dtype**) >> 2. Dispatch on the pointer array... >> > This is actually what the current ufunc code does. > > The innermost dimension is handled via the ufunc loop, whi

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Hans Meine
On Friday 11 June 2010 10:38:28 Pauli Virtanen wrote: > Fri, 11 Jun 2010 10:29:28 +0200, Hans Meine wrote: > > Ideally, algorithms would get wrapped in between two additional > > pre-/postprocessing steps: > > > > 1) Preprocessing: After broadcasting, transpose the input arrays such > > that they

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Pauli Virtanen
Fri, 11 Jun 2010 10:29:28 +0200, Hans Meine wrote: [clip] > Ideally, algorithms would get wrapped in between two additional > pre-/postprocessing steps: > > 1) Preprocessing: After broadcasting, transpose the input arrays such > that they become C order. More specifically, sort the strides of one

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Hans Meine
On Thursday 10 June 2010 22:28:28 Pauli Virtanen wrote: > Some places where Openmp could probably help are in the inner ufunc > loops. However, improving the memory efficiency of the data access > pattern is another low-hanging fruit for multidimensional arrays. I was about to mention this when th

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Pauli Virtanen
Thu, 10 Jun 2010 23:56:56 +0200, Sturla Molden wrote: [clip] > Also about array iterators in NumPy's C base (i.e. for doing something > along an axis): we don't need those. There is a different way of coding > which leads to faster code. > > 1. Collect an array of pointers to each subarray (e.g. u

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Francesc Alted
A Friday 11 June 2010 02:27:18 Sturla Molden escrigué: > >> Another thing I did when reimplementing lfilter was "copy-in copy-out" > >> for strided arrays. > > > > What is copy-in copy out ? I am not familiar with this term ? > > Strided memory access is slow. So it often helps to make a temporary

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Sebastien Binet
On Fri, 11 Jun 2010 00:25:17 +0200, Sturla Molden wrote: > Den 10.06.2010 22:07, skrev Travis Oliphant: > > > >> 2. The core should be a plain DLL, loadable with ctypes. (I know David > >> Cournapeau and Robert Kern is going to hate this.) But if Python can have > >> a custom loader for .pyd fil

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Charles R Harris
On Thu, Jun 10, 2010 at 8:40 PM, Sturla Molden wrote: > Den 11.06.2010 03:02, skrev Charles R Harris: > > > > But for an initial refactoring it probably falls in the category of > > premature optimization. Another thing to avoid on the first go around > > is micro-optimization, as it tends to com

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Sturla Molden
Den 11.06.2010 03:02, skrev Charles R Harris: > > But for an initial refactoring it probably falls in the category of > premature optimization. Another thing to avoid on the first go around > is micro-optimization, as it tends to complicate the code and often > doesn't do much for performance.

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Sturla Molden
Den 11.06.2010 04:19, skrev David: > > Ah, ok, I did not know this was called copy-in/copy-out, thanks for the > explanation. I agree this would be a good direction to pursue, but maybe > out of scope for the first refactoring, > > Copy-in copy-out is actually an implementation detail in Fortr

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread David
On 06/11/2010 09:27 AM, Sturla Molden wrote: > > Strided memory access is slow. So it often helps to make a temporary > copy that are contiguous. Ah, ok, I did not know this was called copy-in/copy-out, thanks for the explanation. I agree this would be a good direction to pursue, but maybe out

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread David
On 06/11/2010 10:02 AM, Charles R Harris wrote: > > > But for an initial refactoring it probably falls in the category of > premature optimization. Another thing to avoid on the first go around is > micro-optimization, as it tends to complicate the code and often doesn't > do much for performance.

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Charles R Harris
On Thu, Jun 10, 2010 at 6:27 PM, Sturla Molden wrote: > Den 11.06.2010 00:57, skrev David Cournapeau: > > Do you have the code for this ? That's something I wanted to do, but > never took the time to do. Faster generic iterator would be nice, but > very hard to do in general. > > > > > > /* thi

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Sturla Molden
Den 11.06.2010 00:57, skrev David Cournapeau: Do you have the code for this ? That's something I wanted to do, but never took the time to do. Faster generic iterator would be nice, but very hard to do in general. /* this computes the start adress for every vector along a dimension (axis)

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Sturla Molden
Den 10.06.2010 22:07, skrev Travis Oliphant: > >> 2. The core should be a plain DLL, loadable with ctypes. (I know David >> Cournapeau and Robert Kern is going to hate this.) But if Python can have a >> custom loader for .pyd files, so can NumPy for it's core DLL. For ctypes we >> just need to s

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread David Cournapeau
On Fri, Jun 11, 2010 at 7:25 AM, Sturla Molden wrote: > Den 10.06.2010 22:07, skrev Travis Oliphant: >> >>> 2. The core should be a plain DLL, loadable with ctypes. (I know David >>> Cournapeau and Robert Kern is going to hate this.) But if Python can have a >>> custom loader for .pyd files, so

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Pauli Virtanen
Thu, 10 Jun 2010 18:48:04 +0200, Sturla Molden wrote: [clip] > 5. Allow OpenMP pragmas in the core. If arrays are above a certain size, > it should switch to multi-threading. Some places where Openmp could probably help are in the inner ufunc loops. However, improving the memory efficiency of the

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread David Cournapeau
On Fri, Jun 11, 2010 at 1:18 AM, Sebastien Binet wrote: > On Thu, 10 Jun 2010 10:47:10 -0500, Jason McCampbell > wrote: >> > 4) Boost has some reference counted pointers, have you looked at them? C++ >> > is admittedly a very different animal for this sort of application. >> > >> >> There is als

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Sturla Molden
Den 10.06.2010 22:28, skrev Pauli Virtanen: > > Some places where Openmp could probably help are in the inner ufunc > loops. However, improving the memory efficiency of the data access > pattern is another low-hanging fruit for multidimensional arrays. > > Getting the intermediate array out of

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Travis Oliphant
On Jun 10, 2010, at 11:48 AM, Sturla Molden wrote: > > I have a few radical suggestions: There are some good ideas there. I suspect we can't address all of them in the course of this re-factoring effort, but I really appreciate you putting them out there, because they are useful things to c

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread David Cournapeau
On Fri, Jun 11, 2010 at 6:56 AM, Sturla Molden wrote: > > Also about array iterators in NumPy's C base (i.e. for doing something > along an axis): we don't need those. There is a different way of coding > which leads to faster code. > > 1. Collect an array of pointers to each subarray (e.g. using

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Sturla Molden
Another suggestion I'd like to make is bytearray as memory buffer for the ndarray. An ndarray could just store or extend a bytearray, instead of having to deal with malloc/free and and the mess that comes with it. Python takes care of the reference counts for bytearrays, and ctypes calls the

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Sturla Molden
Den 10.06.2010 18:48, skrev Sturla Molden: > ctypes will also make porting to other Python implementations easier > (or even other languages: Ruby, JacaScript) easier. Not to mention > that it will make NumPy impervious to changes in the Python C API. Linking is also easier with ctypes. I starte

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Sturla Molden
I have a few radical suggestions: 1. Use ctypes as glue to the core DLL, so we can completely forget about refcounts and similar mess. Why put manual reference counting and error handling in the core? It's stupid. 2. The core should be a plain DLL, loadable with ctypes. (I know David Courna

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Sebastien Binet
On Thu, 10 Jun 2010 10:47:10 -0500, Jason McCampbell wrote: > > 4) Boost has some reference counted pointers, have you looked at them? C++ > > is admittedly a very different animal for this sort of application. > > > > There is also need to replace the usage of PyDict and other uses of CPython >

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Jason McCampbell
Hi Chuck, Good questions. Responses inline below... Jason On Thu, Jun 10, 2010 at 8:26 AM, Charles R Harris wrote: > > > On Wed, Jun 9, 2010 at 5:27 PM, Jason McCampbell < > jmccampb...@enthought.com> wrote: > >> Hi everyone, >> >> This is a follow-up to Travis's message on the re-factoring p

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Charles R Harris
On Thu, Jun 10, 2010 at 7:26 AM, Charles R Harris wrote: > > > On Wed, Jun 9, 2010 at 5:27 PM, Jason McCampbell < > jmccampb...@enthought.com> wrote: > >> Hi everyone, >> >> This is a follow-up to Travis's message on the re-factoring project from >> May 25th and the subsequent discussion. For bac

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Charles R Harris
On Wed, Jun 9, 2010 at 5:27 PM, Jason McCampbell wrote: > Hi everyone, > > This is a follow-up to Travis's message on the re-factoring project from > May 25th and the subsequent discussion. For background, I am a developer at > Enthought working on the NumPy re-factoring project with Travis and Sc

[Numpy-discussion] NumPy re-factoring project

2010-06-09 Thread Jason McCampbell
Hi everyone, This is a follow-up to Travis's message on the re-factoring project from May 25th and the subsequent discussion. For background, I am a developer at Enthought working on the NumPy re-factoring project with Travis and Scott. The immediate goal from our perspective is to re-factor the c