Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-09 Thread Charles R Harris
On 8/8/07, David Cournapeau [EMAIL PROTECTED] wrote:

 Charles R Harris wrote:
  Anne,
 
  On 8/8/07, *Anne Archibald* [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  On 08/08/2007, Charles R Harris [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
  
  
   On 8/8/07, Anne Archibald [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
Oh. Well, it's not *terrible*; it gets you an aligned array.
  But you
have to allocate the original array as a 1D byte array (to
  allow for
arbitrary realignments) and then align it, reshape it, and
  reinterpret
it as a new type. Plus you're allocating an extra ndarray
  structure,
which will live as long as the new array does; this not only
  wastes
even more memory than the portable alignment solutions, it
  clogs up
python's garbage collector.
  
   The ndarray structure doesn't take up much memory, it is the
  data that is
   large and the data is shared between the original array and the
  slice. Nor
   does the data type of the slice need changing, one simply uses
  the desired
   type to begin with, or at least a type of the right size so that
  a view will
   do the job without copies. Nor do I see how the garbage
  collector will get
   clogged up, slices are a common feature of using numpy. The
  slice method
   also has the advantage of being compiler and operating system
  independent,
   there is a reason Intel used that approach.
 
 I am not sure to understand which approach to which problem you are
 talking about here ?

 IMHO, the discussion is becoming a bit carried away. What I was
 suggesting is
 - being able to check whether a given data buffer is aligned to a
 given alignment (easy)
 - being able to request an aligned data buffer: requires aligned
 memory allocators, and some additions to the API for creating arrays.

 This all boils down to the following case: I have a C function which
 requires N bytes aligned data, I want the numpy API to provide this
 capability. I don't understand the discussion on doing it in python:


Well, what you want might be very easy to do in python, we just need to
check the default alignments for doubles and floats for some of the other
compilers, architectures, and OS's out there. On the other hand, you might
not be able to request a c malloc that is aligned in a portable way without
resorting to the same tricks as you do in python. So why not use python and
get the reference counting and garbage collection along with it? What we
want are doubles 8 byte aligned and floats 4 byte aligned. That seems to be
the case with gcc, linux, and the Intel architecture. The idea is to create
a slightly oversize array, then use a slice of the proper size that is 16
byte aligned.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-09 Thread Charles R Harris
On 8/9/07, Charles R Harris [EMAIL PROTECTED] wrote:



 On 8/8/07, David Cournapeau [EMAIL PROTECTED] wrote:
 
  Charles R Harris wrote:
   Anne,
  
   On 8/8/07, *Anne Archibald* [EMAIL PROTECTED]
   mailto: [EMAIL PROTECTED] wrote:
  
   On 08/08/2007, Charles R Harris [EMAIL PROTECTED]
   mailto: [EMAIL PROTECTED] wrote:
   
   
On 8/8/07, Anne Archibald [EMAIL PROTECTED]
   mailto: [EMAIL PROTECTED] wrote:
 Oh. Well, it's not *terrible*; it gets you an aligned array.
   But you
 have to allocate the original array as a 1D byte array (to
   allow for
 arbitrary realignments) and then align it, reshape it, and
   reinterpret
 it as a new type. Plus you're allocating an extra ndarray
   structure,
 which will live as long as the new array does; this not only
   wastes
 even more memory than the portable alignment solutions, it
   clogs up
 python's garbage collector.
   
The ndarray structure doesn't take up much memory, it is the
   data that is
large and the data is shared between the original array and the
   slice. Nor
does the data type of the slice need changing, one simply uses
   the desired
type to begin with, or at least a type of the right size so that
   a view will
do the job without copies. Nor do I see how the garbage
   collector will get
clogged up, slices are a common feature of using numpy. The
   slice method
also has the advantage of being compiler and operating system
   independent,
there is a reason Intel used that approach.
  
  I am not sure to understand which approach to which problem you are
  talking about here ?
 
  IMHO, the discussion is becoming a bit carried away. What I was
  suggesting is
  - being able to check whether a given data buffer is aligned to a
  given alignment (easy)
  - being able to request an aligned data buffer: requires aligned
  memory allocators, and some additions to the API for creating arrays.
 
  This all boils down to the following case: I have a C function which
  requires N bytes aligned data, I want the numpy API to provide this
  capability. I don't understand the discussion on doing it in python:


 Well, what you want might be very easy to do in python, we just need to
 check the default alignments for doubles and floats for some of the other
 compilers, architectures, and OS's out there. On the other hand, you might
 not be able to request a c malloc that is aligned in a portable way without
 resorting to the same tricks as you do in python. So why not use python and
 get the reference counting and garbage collection along with it? What we
 want are doubles 8 byte aligned and floats 4 byte aligned. That seems to be
 the case with gcc, linux, and the Intel architecture. The idea is to create
 a slightly oversize array, then use a slice of the proper size that is 16
 byte aligned.

 Chuck


For instance, in the case of  linux-x86 and linux-x86_64, the following
should work:

In [68]: def align16(n,dtype=float64) :
   : size = dtype().dtype.itemsize
   : over = 16/size
   : data = empty(n + over, dtype=dtype)
   : skip = (- data.ctypes.data % 16)/size
   : return data[skip:skip + n]

Of course, now you need to fill in the data.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-09 Thread David Cournapeau
Charles R Harris wrote:

 Well, what you want might be very easy to do in python, we just need 
 to check the default alignments for doubles and floats for some of the 
 other compilers, architectures, and OS's out there. On the other hand, 
 you might not be able to request a c malloc that is aligned in a 
 portable way without resorting to the same tricks as you do in python. 
 So why not use python and get the reference counting and garbage 
 collection along with it?
First, doing it in python means that I cannot use the facility from C 
easily. But this is exactly where I need it, and where I would guess 
most people need it. People want to interface numpy with the mkl ? They 
will do it in C, right ? And maybe I am just too dumb to see the 
problem, but I don't see the need for garbage collection and so on :) 
Again, what is needed is:
- aligned allocator - we can use the one from Steven Johnson, used 
in fftw, which support more or less the same archs than numpy
- Refactor the array creation functions in C such as the 
implementation takes one additional alignement argument, and the 
original functions are kept identical to before
- Add a few utilities function to check whether it is SSE aligned, 
arbitrary aligned, etc...

The only non trivial point is 2 . Actually, when I first thought about 
it, I thought about fixing alignement at compile time, which would have 
made it totally avoidable: it would have been a simple change of the 
definition of PyDataMem_New to an aligned malloc with a constant. I have 
already the code for this, and besides aligned malloc code, it is like a 
5 lines change of numpy code, nothing terrible, really.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-09 Thread David Cournapeau
Charles R Harris wrote:

 Ah, you want it in C.
What would be the use to get SIMD aligned arrays in python ?

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-09 Thread Charles R Harris
On 8/9/07, David Cournapeau [EMAIL PROTECTED] wrote:

 Charles R Harris wrote:
 
  Well, what you want might be very easy to do in python, we just need
  to check the default alignments for doubles and floats for some of the
  other compilers, architectures, and OS's out there. On the other hand,
  you might not be able to request a c malloc that is aligned in a
  portable way without resorting to the same tricks as you do in python.
  So why not use python and get the reference counting and garbage
  collection along with it?
 First, doing it in python means that I cannot use the facility from C
 easily. But this is exactly where I need it, and where I would guess
 most people need it. People want to interface numpy with the mkl ? They
 will do it in C, right ? And maybe I am just too dumb to see the
 problem, but I don't see the need for garbage collection and so on :)
 Again, what is needed is:
 - aligned allocator - we can use the one from Steven Johnson, used
 in fftw, which support more or less the same archs than numpy
 - Refactor the array creation functions in C such as the
 implementation takes one additional alignement argument, and the
 original functions are kept identical to before
 - Add a few utilities function to check whether it is SSE aligned,
 arbitrary aligned, etc...

 The only non trivial point is 2 . Actually, when I first thought about
 it, I thought about fixing alignement at compile time, which would have
 made it totally avoidable: it would have been a simple change of the
 definition of PyDataMem_New to an aligned malloc with a constant. I have
 already the code for this, and besides aligned malloc code, it is like a
 5 lines change of numpy code, nothing terrible, really.


Ah, you want it in C. Well, I think it would not be too difficult to change
PyDataMem_New, however, the function signature would change and all the code
that used it would break. That is pretty drastic. Better to define
PyDataMem_New_Aligned, then redefine PyDataMem_New to use the new function.
That way nothing breaks and you get the function you need. I don't think
Travis would get upset if you added such a function and documented it.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-09 Thread Charles R Harris
On 8/9/07, David Cournapeau [EMAIL PROTECTED] wrote:

 Charles R Harris wrote:
 
  Ah, you want it in C.
 What would be the use to get SIMD aligned arrays in python ?


If I wanted a fairly specialized routine and didn't want to touch the guts
of numpy, I would pass the aligned array to a C function and use the data
pointer. The python code would be just a high level wrapper. You might even
be able to use ctypes to pass the pointer into a library function. It's not
necessary to code everything in C using the python C API.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-09 Thread Stefan van der Walt
On Thu, Aug 09, 2007 at 04:52:38PM +0900, David Cournapeau wrote:
 Charles R Harris wrote:
 
  Well, what you want might be very easy to do in python, we just need 
  to check the default alignments for doubles and floats for some of the 
  other compilers, architectures, and OS's out there. On the other hand, 
  you might not be able to request a c malloc that is aligned in a 
  portable way without resorting to the same tricks as you do in python. 
  So why not use python and get the reference counting and garbage 
  collection along with it?
 First, doing it in python means that I cannot use the facility from C 
 easily. But this is exactly where I need it, and where I would guess 
 most people need it. People want to interface numpy with the mkl ? They 
 will do it in C, right ?

It doesn't really matter where the memory allocation occurs, does it?
As far as I understand, the underlying fftw function has some flag to
indicate when the data is aligned.  If so, we could expose that flag
in Python, and do something like

x = align16(data)
_fft(x, is_aligned=True)

I am not intimately familiar with the fft wrappers, so maybe I'm
missing something more fundamental.

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-09 Thread David Cournapeau
On 8/9/07, Stefan van der Walt [EMAIL PROTECTED] wrote:


 It doesn't really matter where the memory allocation occurs, does it?
 As far as I understand, the underlying fftw function has some flag to
 indicate when the data is aligned.  If so, we could expose that flag
 in Python, and do something like

 x = align16(data)
 _fft(x, is_aligned=True)

 I am not intimately familiar with the fft wrappers, so maybe I'm
 missing something more fundamental.

You can do that, but this is only a special case of what I have in
mind. For example, what if you want to call functions which are
relatively cheap, but called many times, and want an aligned array ?
Going back and forth would be a huge waste. Also, having aligned
buffers internally (in C_, even for non array data, can be useful (eg
filters, and maybe even core numpy functionalities like ufunc,
etc...). Another point I forgot to mention before is that we can
define a default alignment which would already be SIMD friendly (as
done on Mac OS X or FreeBSD by default malloc) for *all* numpy arrays
at 0 cost: for fft, this means that most arrays would already by as
wanted, meaning a huge boost of performances for free.

Basically, the functionalities would be more usable in C, without too
much constraint, because frankly, the implementation is not difficult:
I have something almost ready, and the patch is 7kb, including code to
detect platform dependent aligned allocator. The C code can be tested
really easily (since it is independent of python).

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-08 Thread Stefan van der Walt
On Tue, Aug 07, 2007 at 01:33:24AM -0400, Anne Archibald wrote:
 Well, it can be done in Python: just allocate a too-big ndarray and
 take a slice that's the right shape and has the right alignment. But
 this sucks.

Could you explain to me why is this such a bad idea?

Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-08 Thread Anne Archibald
On 08/08/2007, Stefan van der Walt [EMAIL PROTECTED] wrote:
 On Tue, Aug 07, 2007 at 01:33:24AM -0400, Anne Archibald wrote:
  Well, it can be done in Python: just allocate a too-big ndarray and
  take a slice that's the right shape and has the right alignment. But
  this sucks.

 Could you explain to me why is this such a bad idea?

Oh. Well, it's not *terrible*; it gets you an aligned array. But you
have to allocate the original array as a 1D byte array (to allow for
arbitrary realignments) and then align it, reshape it, and reinterpret
it as a new type. Plus you're allocating an extra ndarray structure,
which will live as long as the new array does; this not only wastes
even more memory than the portable alignment solutions, it clogs up
python's garbage collector.

It's not outrageous, if you need aligned arrays *now*, on a released
version of numpy, but numpy itself should do better.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-08 Thread Charles R Harris
On 8/8/07, Anne Archibald [EMAIL PROTECTED] wrote:

 On 08/08/2007, Stefan van der Walt [EMAIL PROTECTED] wrote:
  On Tue, Aug 07, 2007 at 01:33:24AM -0400, Anne Archibald wrote:
   Well, it can be done in Python: just allocate a too-big ndarray and
   take a slice that's the right shape and has the right alignment. But
   this sucks.
 
  Could you explain to me why is this such a bad idea?

 Oh. Well, it's not *terrible*; it gets you an aligned array. But you
 have to allocate the original array as a 1D byte array (to allow for
 arbitrary realignments) and then align it, reshape it, and reinterpret
 it as a new type. Plus you're allocating an extra ndarray structure,
 which will live as long as the new array does; this not only wastes
 even more memory than the portable alignment solutions, it clogs up
 python's garbage collector.


The ndarray structure doesn't take up much memory, it is the data that is
large and the data is shared between the original array and the slice. Nor
does the data type of the slice need changing, one simply uses the desired
type to begin with, or at least a type of the right size so that a view will
do the job without copies. Nor do I see how the garbage collector will get
clogged up, slices are a common feature of using numpy. The slice method
also has the advantage of being compiler and operating system independent,
there is a reason Intel used that approach.

Aligning multidimensional arrays might indeed be complicated, but I suspect
those complications will be easier to handle in Python than in C.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-08 Thread Anne Archibald
On 08/08/2007, Charles R Harris [EMAIL PROTECTED] wrote:


 On 8/8/07, Anne Archibald [EMAIL PROTECTED] wrote:
  Oh. Well, it's not *terrible*; it gets you an aligned array. But you
  have to allocate the original array as a 1D byte array (to allow for
  arbitrary realignments) and then align it, reshape it, and reinterpret
  it as a new type. Plus you're allocating an extra ndarray structure,
  which will live as long as the new array does; this not only wastes
  even more memory than the portable alignment solutions, it clogs up
  python's garbage collector.

 The ndarray structure doesn't take up much memory, it is the data that is
 large and the data is shared between the original array and the slice. Nor
 does the data type of the slice need changing, one simply uses the desired
 type to begin with, or at least a type of the right size so that a view will
 do the job without copies. Nor do I see how the garbage collector will get
 clogged up, slices are a common feature of using numpy. The slice method
 also has the advantage of being compiler and operating system independent,
 there is a reason Intel used that approach.

 Aligning multidimensional arrays might indeed be complicated, but I suspect
 those complications will be easier to handle in Python than in C.

Can we assume that numpy arrays allocated to contain (say) complex64s
are aligned to a 16-byte boundary? I don't think they will
necessarily, so the shift we need may not be an integer number of
complex64s. float96s pose even more problems. So to ensure alignment,
we do need to do type conversion; if we're doing it anyway, byte
arrays require the least trust in malloc().

The ndarray object isn't too big, probably some twenty or thirty
bytes, so I'm not talking about a huge waste. But it is a python
object, and the garbage collector needs to walk the whole tree of
accessible python objects every time it runs, so this is one more
object on the list.

As an aside: numpy's handling of ndarray objects is actually not
ideal; if you want to exhaust memory on your system, do:

a = arange(5)
while True:
a = a[::-1]

Each ndarray object keeps alive the ndarray object it is a slice of,
so this operation creates an ever-growing linked list of ndarray
objects. Seems to me it would be better to keep a pointer only to the
original object that holds the address of the buffer (so it can be
freed).

Aligning multidimensional arrays is an interesting question. To first
order, aligning the first element should be enough. If the dimensions
of the array are not divisible by the alignment, though, this means
that lower-dimensional complete slices may not be aligned:

A = aligned_empty((7,5),dtype=float,alignment=16)

Then A is aligned, as is A[0,:], but A[1,:] is not.

So in this case we might want to actually allocate an 8-by-5 array and
take a slice. This does mean it won't be contiguous in memory, so that
flattening it requires a copy (which may not wind up aligned). This is
something we might want to do - that is, make available as an option -
in python.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-08 Thread Charles R Harris
Anne,

On 8/8/07, Anne Archibald [EMAIL PROTECTED] wrote:

 On 08/08/2007, Charles R Harris [EMAIL PROTECTED] wrote:
 
 
  On 8/8/07, Anne Archibald [EMAIL PROTECTED] wrote:
   Oh. Well, it's not *terrible*; it gets you an aligned array. But you
   have to allocate the original array as a 1D byte array (to allow for
   arbitrary realignments) and then align it, reshape it, and reinterpret
   it as a new type. Plus you're allocating an extra ndarray structure,
   which will live as long as the new array does; this not only wastes
   even more memory than the portable alignment solutions, it clogs up
   python's garbage collector.
 
  The ndarray structure doesn't take up much memory, it is the data that
 is
  large and the data is shared between the original array and the slice.
 Nor
  does the data type of the slice need changing, one simply uses the
 desired
  type to begin with, or at least a type of the right size so that a view
 will
  do the job without copies. Nor do I see how the garbage collector will
 get
  clogged up, slices are a common feature of using numpy. The slice method
  also has the advantage of being compiler and operating system
 independent,
  there is a reason Intel used that approach.
 
  Aligning multidimensional arrays might indeed be complicated, but I
 suspect
  those complications will be easier to handle in Python than in C.

 Can we assume that numpy arrays allocated to contain (say) complex64s
 are aligned to a 16-byte boundary? I don't think they will
 necessarily, so the shift we need may not be an integer number of
 complex64s. float96s pose even more problems. So to ensure alignment,
 we do need to do type conversion; if we're doing it anyway, byte
 arrays require the least trust in malloc().


I think that is a safe assumption, it is probably almost as safe as assuming
binary and two's complement, likely more safe than assuming ieee 784.  I
expect almost all 32 bit OS's to align on 4 byte boundaries at worst, 64 bit
machines to align on 8 byte boundaries. Even C structures are typically
filled out with blanks to preserve some sort of alignment. That is because
of addressing efficiency, or even the impossibility of odd addressing --
depends on the architecture. Sometimes even byte addressing is easier to get
by putting a larger integer on the bus and extracting the relevant part. In
addition, I expect the heap implementation to make some alignment decisions
for efficiency.

My 64 bit linux on Intel aligns arrays, whatever the data type, on 16 byte
boundaries. It might be interesting to see what happens with the Intel and
MSVC comipilers, but I expect similar results. PPC's, Sun and SGI need to be
checked, but I don't expect problems. I think that will cover almost all
architectures numpy is likely to run on.


 The ndarray object isn't too big, probably some twenty or thirty
 bytes, so I'm not talking about a huge waste. But it is a python
 object, and the garbage collector needs to walk the whole tree of
 accessible python objects every time it runs, so this is one more
 object on the list.

 As an aside: numpy's handling of ndarray objects is actually not
 ideal; if you want to exhaust memory on your system, do:

 a = arange(5)
 while True:
 a = a[::-1]


Well, that's a pathological case present in numpy. Fixing it doesn't seem to
be a high priority although there is a ticket somewhere.

Each ndarray object keeps alive the ndarray object it is a slice of,
 so this operation creates an ever-growing linked list of ndarray
 objects. Seems to me it would be better to keep a pointer only to the
 original object that holds the address of the buffer (so it can be
 freed).

 Aligning multidimensional arrays is an interesting question. To first
 order, aligning the first element should be enough. If the dimensions
 of the array are not divisible by the alignment, though, this means
 that lower-dimensional complete slices may not be aligned:

 A = aligned_empty((7,5),dtype=float,alignment=16)

 Then A is aligned, as is A[0,:], but A[1,:] is not.

 So in this case we might want to actually allocate an 8-by-5 array and
 take a slice. This does mean it won't be contiguous in memory, so that
 flattening it requires a copy (which may not wind up aligned). This is
 something we might want to do - that is, make available as an option -
 in python.


I think that is better viewed as need based. I suspect that if you really
need such alignment it is better to start with array dimensions that will
naturally align the rows. It will be impossible to naturally align all the
columnes unless the data type is the correct size.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-08 Thread Matthieu Brucher

 My 64 bit linux on Intel aligns arrays, whatever the data type, on 16 byte
 boundaries. It might be interesting to see what happens with the Intel and
 MSVC comipilers, but I expect similar results.


According to the doc on the msdn, the data should be 16-bits aligned.

Matthieu
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-08 Thread Charles R Harris
On 8/8/07, Matthieu Brucher [EMAIL PROTECTED] wrote:

 My 64 bit linux on Intel aligns arrays, whatever the data type, on 16 byte
  boundaries. It might be interesting to see what happens with the Intel and
  MSVC comipilers, but I expect similar results.
 

 According to the doc on the msdn, the data should be 16-bits aligned.


Shades of DOS and 16 bit machines. Have you checked what actually happens on
modern hardware?

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-08 Thread David Cournapeau
Charles R Harris wrote:
 Anne,

 On 8/8/07, *Anne Archibald* [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:

 On 08/08/2007, Charles R Harris [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:
 
 
  On 8/8/07, Anne Archibald [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:
   Oh. Well, it's not *terrible*; it gets you an aligned array.
 But you
   have to allocate the original array as a 1D byte array (to
 allow for
   arbitrary realignments) and then align it, reshape it, and
 reinterpret
   it as a new type. Plus you're allocating an extra ndarray
 structure,
   which will live as long as the new array does; this not only
 wastes
   even more memory than the portable alignment solutions, it
 clogs up
   python's garbage collector.
 
  The ndarray structure doesn't take up much memory, it is the
 data that is
  large and the data is shared between the original array and the
 slice. Nor
  does the data type of the slice need changing, one simply uses
 the desired
  type to begin with, or at least a type of the right size so that
 a view will
  do the job without copies. Nor do I see how the garbage
 collector will get
  clogged up, slices are a common feature of using numpy. The
 slice method
  also has the advantage of being compiler and operating system
 independent,
  there is a reason Intel used that approach.

I am not sure to understand which approach to which problem you are 
talking about here ?

IMHO, the discussion is becoming a bit carried away. What I was 
suggesting is
- being able to check whether a given data buffer is aligned to a 
given alignment (easy)
- being able to request an aligned data buffer: requires aligned 
memory allocators, and some additions to the API for creating arrays.

This all boils down to the following case: I have a C function which 
requires N bytes aligned data, I want the numpy API to provide this 
capability. I don't understand the discussion on doing it in python: 
first, this means you cannot request a data buffer at the C level, and I 
don't understand the whole discussion on slice, multi dimension and so 
on either: at the C level, different libraries may need different arrays 
formats, and in the case of fftw, all it cares about is the alignment of 
the data pointer. For contiguous, C order arrays, as long as the data 
pointer is aligned, I don't think we need more; are some people familiar 
with the MKL, who could tell whether we need more ?

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-07 Thread Matthieu Brucher

 For platforms without posix_memalign, I don't see how to
 implement a memory allocator with an arbitrary alignment (more
 precisely, I don't see how to free it if I cannot assume a fixed
 alignement: how do I know where the real pointer is ?).


Visual Studio seems to offer a counter part (also note that malloc is
supposed to return a pointer on a 16bits boundary) which is called
_aligned_malloc (
http://msdn2.microsoft.com/en-us/library/8z34s9c6(VS.80).aspx). It should be
what you need, at least for Windows/MSVC.

Matthieu
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-07 Thread David Cournapeau
Anne Archibald wrote:
 Well, it can be done in Python: just allocate a too-big ndarray and
 take a slice that's the right shape and has the right alignment. But
 this sucks. Stephen G. Johnson posted code earlier in this thread that
 provides a portable aligned-memory allocator - it handles the freeing
 by (always) storing enough information to recover the original pointer
 in the padding space. (This means you always need to pad, which is a
 pain, but there's not much you can do about that.) 
This is indeed no rocket science, I feel a bit ashamed :) I don't see 
the problem with padding (except wasted time) ?
 His implementation
 stores the original pointer just before the beginning of the aligned
 data, so _aligned_free is free(((void**)ptr)[-1]). If you were worried
 about space (rather than time) you could store a single byte just
 before the pointer whose value indicated how much padding was done, or
 whatever.
I really don't see how space would be a problem in our situation: it is 
not like we will pad more than a few bytes; in the case it is, I don't 
see how python would be the right choice anymore anyway. I will try to 
prepare a patch the next few days, then.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-07 Thread Charles R Harris
On 8/6/07, Anne Archibald [EMAIL PROTECTED] wrote:

 On 06/08/07, David Cournapeau [EMAIL PROTECTED] wrote:

  Well, when I proposed the SIMD extension, I was willing to implement the
  proposal, and this was for a simple goal: enabling better integration
  with many numeric libraries which need SIMD alignment.
 
  As nice as a custom allocator might be, I will certainly not implement
  it myself. For SIMD, I think the weight adding complexity / benefit
  worth it (since there is not much change to the API and implementation),
  and I know more or less how to do it; for custom allocator, that's an
  entirely different story. That's really more complex; static pools may
  be useful in some cases (but that's not obvious, since only the data are
  allocated with this buffer, everything else being allocated through the
  python memory allocator, and numpy arrays have pretty simple memory
  allocation patterns).

 I have to agree. I can hardly volunteer David for anything, and I
 don't have time to implement this myself, but I think a custom
 allocator is a rather special-purpose tool; if one were to implement
 one, I think the way to go would be to implement a subclass of ndarray
 (or just a constructor) that allocated the memory. This could be done
 from python, since you can make an ndarray from scratch using a given
 memory array. Of course, making temporaries be allocated with the
 correct allocator will be very complicated, since it's unclear which
 allocator should be used.


Maybe I'm missing something, but handling the temporaries is automatic. Just
return the appropriate slice from an array created in a subroutine. The
original array gets its reference count decremented when the routine exits
but the slice will still hold one. When the slice is deleted all the
allocated memory will get garbage collected.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-06 Thread Lisandro Dalcin
On 8/3/07, David Cournapeau [EMAIL PROTECTED] wrote:
  Here is what I can think of:
   - adding an API to know whether a given PyArrayObject has its data
 buffer 16 bytes aligned, and requesting a 16 bytes aligned
 PyArrayObject. Something like NPY_ALIGNED, basically.
   - forcing data allocation to be 16 bytes aligned in numpy (eg
 define PyDataMem_Mem to a 16 bytes aligned allocator instead of malloc).

All this sounds pretty similar to sdt::allocator we can found in C++
STL (http://www.sgi.com/tech/stl/Allocators.html). Perhaps a NumPy
array could be associated with an instance of an 'allocator' object
(surely written in C, perhaps subclassable in Python) providing
appropriate methos for
alloc/dealloc(/realloc?/initialize(memset)?/copy(memcpy)?) memory.

This would be really nice, as it is extensible (you could even write a
custom allocator, perhaps making use of a preallocated,static pool;
use of C++ new/delete; use of any C++ std::allocator, shared memory,
etc. etc.). I think this is the direction to go but no idea how much
difficult it could be to implement.


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-06 Thread Anne Archibald
On 06/08/07, David Cournapeau [EMAIL PROTECTED] wrote:

 Well, when I proposed the SIMD extension, I was willing to implement the
 proposal, and this was for a simple goal: enabling better integration
 with many numeric libraries which need SIMD alignment.

 As nice as a custom allocator might be, I will certainly not implement
 it myself. For SIMD, I think the weight adding complexity / benefit
 worth it (since there is not much change to the API and implementation),
 and I know more or less how to do it; for custom allocator, that's an
 entirely different story. That's really more complex; static pools may
 be useful in some cases (but that's not obvious, since only the data are
 allocated with this buffer, everything else being allocated through the
 python memory allocator, and numpy arrays have pretty simple memory
 allocation patterns).

I have to agree. I can hardly volunteer David for anything, and I
don't have time to implement this myself, but I think a custom
allocator is a rather special-purpose tool; if one were to implement
one, I think the way to go would be to implement a subclass of ndarray
(or just a constructor) that allocated the memory. This could be done
from python, since you can make an ndarray from scratch using a given
memory array. Of course, making temporaries be allocated with the
correct allocator will be very complicated, since it's unclear which
allocator should be used.

Adding SIMD alignment should be a very small modification; it can be
done as simply as using ctypes to wrap posix_memalign (or a portable
version, possibly written in python) and writing a simple python
function that checks the beginning data address. There's really no
need to make it complicated.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-06 Thread David Cournapeau
Anne Archibald wrote:

 I have to agree. I can hardly volunteer David for anything, and I
 don't have time to implement this myself, but I think a custom
 allocator is a rather special-purpose tool; if one were to implement
 one, I think the way to go would be to implement a subclass of ndarray
 (or just a constructor) that allocated the memory. This could be done
 from python, since you can make an ndarray from scratch using a given
 memory array. Of course, making temporaries be allocated with the
 correct allocator will be very complicated, since it's unclear which
 allocator should be used.

 Adding SIMD alignment should be a very small modification; it can be
 done as simply as using ctypes to wrap posix_memalign (or a portable
 version, possibly written in python) and writing a simple python
 function that checks the beginning data address. There's really no
 need to make it complicated.

Anne, you said previously that it was easy to allocate buffers for a 
given alignment at runtime. Could you point me to a document which 
explains how ? For platforms without posix_memalign, I don't see how to 
implement a memory allocator with an arbitrary alignment (more 
precisely, I don't see how to free it if I cannot assume a fixed 
alignement: how do I know where the real pointer is ?).

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-06 Thread Anne Archibald
On 07/08/07, David Cournapeau [EMAIL PROTECTED] wrote:

 Anne, you said previously that it was easy to allocate buffers for a
 given alignment at runtime. Could you point me to a document which
 explains how ? For platforms without posix_memalign, I don't see how to
 implement a memory allocator with an arbitrary alignment (more
 precisely, I don't see how to free it if I cannot assume a fixed
 alignement: how do I know where the real pointer is ?).

Well, it can be done in Python: just allocate a too-big ndarray and
take a slice that's the right shape and has the right alignment. But
this sucks. Stephen G. Johnson posted code earlier in this thread that
provides a portable aligned-memory allocator - it handles the freeing
by (always) storing enough information to recover the original pointer
in the padding space. (This means you always need to pad, which is a
pain, but there's not much you can do about that.) His implementation
stores the original pointer just before the beginning of the aligned
data, so _aligned_free is free(((void**)ptr)[-1]). If you were worried
about space (rather than time) you could store a single byte just
before the pointer whose value indicated how much padding was done, or
whatever.

These schemes all waste space, but unless malloc's internal structures
are the size of the alignment block, it's almost unavoidable to waste
some space; the only way around it I can see is if the program also
allocates lots of small, odd-shaped, unaligned blocks of memory that
can be used to fill the gaps (and even then I doubt any sensible
malloc implementation fills in little gaps like this, since it seems
likely to lead to memory fragmentation). A posix_memalign that is
built into malloc can do better than any implementation that isn't,
though, with the possible exception of a specialized pool allocator
built with aligned allocation in mind.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-05 Thread Steven G. Johnson
On Aug 4, 3:24 am, Anne Archibald [EMAIL PROTECTED] wrote:

 It seems to me two things are needed:

 * A mechanism for requesting numpy arrays with buffers aligned to an
 arbitrary power-of-two size (basically just using posix_memalign or
 some horrible hack on platforms that don't have it).

Right, you might as well allow the alignment (to a power-of-two size)
to be specified at runtime, as there is really no cost to implementing
an arbitrary alignment once you have any alignment.

Although you should definitely use posix_memalign (or the old
memalign) where it is available, unfortunately it's not implemented on
all systems.  e.g. MacOS X and FreeBSD don't have it, last I checked
(although in both cases their malloc is 16-byte aligned).  Microsoft VC
++ has a function called _aligned_malloc which is equivalent.

However, since MinGW (www.mingw.org) didn't have an _aligned_malloc
function, I wrote one for them a few years ago and put it in the
public domain (I use MinGW to cross-compile to Windows from Linux and
need the alignment).  You are free to use it as a fallback on systems
that don't have a memalign function if you want.  It should work on
any system where sizeof(void*) is a power of two (i.e. every extant
architecture, that I know of).  You can download it and its test
program from:
   ab-initio.mit.edu/~stevenj/align.c
   ab-initio.mit.edu/~stevenj/tstalign.c
It just uses malloc with a little extra padding as needed to align the
data, plus a copy of the original pointer so that you can still free
and realloc (using _aligned_free and _aligned_realloc).  It could be
made a bit more efficient, but it probably doesn't matter.

 * A macro (in C, and some way to get the same information from python,
 perhaps just a.ctypes.data % 16) to test for common alignment cases;
 SIMD alignment and arbitrary power-of-two alignment are probably
 sufficient.

In C this is easy, just ((uintptr_t) pointer) % 16 == 0.

You might also consider a way to set the default alignment of numpy
arrays at runtime, rather than requesting aligned arrays
individually.  e.g. so that someone could come along at a later date
to a large program and just add one function call to make all the
arrays 16-byte aligned to improve performance using SIMD libraries.

Regards,
Steven G. Johnson

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-04 Thread Charles R Harris
On 8/3/07, Charles R Harris [EMAIL PROTECTED] wrote:



 On 8/3/07, David Cournapeau [EMAIL PROTECTED] wrote:
 
  Andrew Straw wrote:
   Dear David,
  
   Both ideas, particularly the 2nd, would be excellent additions to
  numpy.
   I often use the Intel IPP (Integrated Performance Primitives) Library
   together with numpy, but I have to do all my memory allocation with
  the
   IPP to ensure fastest operation. I then create numpy views of the
  data.
   All this works brilliantly, but it would be really nice if I could
   allocate the memory directly in numpy.
  
   IPP allocates, and says it wants, 32 byte aligned memory (see, e.g.
   http://www.intel.com/support/performancetools/sb/CS-021418.htm ).
  Given
   that fftw3 apparently wants 16 byte aligned memory, my feeling is
  that,
 if the effort is made, the alignment width should be specified at
   run-time, rather than hard-coded.
  I think that doing it at runtime would be overkill, no ? I was thinking
  about making it a compile option. Generally, at the ASM level, you need
  16 bytes alignment (for instructions like movaps, which takes 16 bytes
  in memory and put it in the SSE registers), this is not just fftw. Maybe
  the 32 bytes alignment is useful for cache reasons, I don't know.
 
  I don't think it would be difficult to implement and validate; what I
  don't know at all is the implication of this at the binary level, if
  any.



 Here's a hack that google turned up:

 (1) Use static variables instead of dynamic (stack) variables
 (2) Use in-line assembly code that explicitly aligns data
 (3) In C code, use *malloc* to explicitly allocate variables

 Here is Intel's example of (2):

 ; procedure prologue
 push ebp
 mov esp, ebp
 and ebp, -8
 sub esp, 12

 ; procedure epilogue
 add esp, 12
 pop ebp
 ret

 Intel's example of (3), slightly modified:

 double *p, *newp;
 p = (double*)*malloc* ((sizeof(double)*NPTS)+4);
 newp = (p+4)  (~7);

 This assures that newp is 8-*byte* aligned even if p is not. However,
 *malloc*() may already follow Intel's recommendation that a *32*-* byte*or
 greater data structures be aligned on a * 32* *byte* boundary. In that
 case,
 increasing the requested memory by 4 bytes and computing newp are
 superfluous.


I think that for numpy arrays it should be possible to define the offset so
that the result is 32 byte aligned. However, this might break some peoples'
code if they haven't payed attention to the offset. Another possibility is
to allocate an oversized array, check the pointer, and take a range out of
it. For instance:

In [32]: a = zeros(10)

In [33]: a.ctypes.data % 32
Out[33]: 16

The array alignment is 16 bytes, consequently

In [34]: a[2:].ctypes.data % 32
Out[34]: 0

Voila, 32 byte alignment. I think a short python routine could do this,
which ought to serve well for 1D fft's. Multidimensional arrays will be
trickier if you want the rows to be aligned. Aligning the columns just isn't
going to work.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-04 Thread David Cournapeau



 Here's a hack that google turned up:

 (1) Use static variables instead of dynamic (stack) variables
 (2) Use in-line assembly code that explicitly aligns data
 (3) In C code, use *malloc* to explicitly allocate variables

 Here is Intel's example of (2):

 ; procedure prologue
 push ebp
 mov esp, ebp
 and ebp, -8
 sub esp, 12

 ; procedure epilogue
 add esp, 12
 pop ebp
 ret

 Intel's example of (3), slightly modified:

 double *p, *newp;
 p = (double*)*malloc* ((sizeof(double)*NPTS)+4);
 newp = (p+4)  (~7);

 This assures that newp is 8-*byte* aligned even if p is not. However,
 *malloc*() may already follow Intel's recommendation that a *32*-*
 byte* or
 greater data structures be aligned on a *32* *byte* boundary. In
 that case,
 increasing the requested memory by 4 bytes and computing newp are
 superfluous.


 I think that for numpy arrays it should be possible to define the 
 offset so that the result is 32 byte aligned. However, this might 
 break some peoples' code if they haven't payed attention to the offset.
Why ? I really don't see how it can break anything at the source code 
level. You don't have to care about things you didn't care before: the 
best proof of that if that numpy runs on different platforms where the 
malloc has different alignment guarantees (mac OS X already aligned to 
16 bytes, for the very reason of making optimizing with SIMD easier, 
whereas glibc malloc only aligns to 8 bytes, at least on Linux).
 Another possibility is to allocate an oversized array, check the 
 pointer, and take a range out of it. For instance:

 In [32]: a = zeros(10)

 In [33]: a.ctypes.data % 32
 Out[33]: 16

 The array alignment is 16 bytes, consequently

 In [34]: a[2:].ctypes.data % 32
 Out[34]: 0

 Voila, 32 byte alignment. I think a short python routine could do 
 this, which ought to serve well for 1D fft's. Multidimensional arrays 
 will be trickier if you want the rows to be aligned. Aligning the 
 columns just isn't going to work.
I am not suggesting realigning existing arrays. What I would like numpy 
to support are the following cases:

 - Check whether a given a numpy array is simd aligned:

/* Simple case: if aligned, use optimized func, use non optimized 
otherwise */
int simd_func(double* in, size_t n);
int nosimd_func(double* in, size_t n);

if (PyArray_ISALIGNED_SIMD(a)) {
simd_func((double *)a-data, a-size);
} else {
nosimd_func((double *)a-data, a-size);
}
 - Request explicitely an aligned arrays from any PyArray_* functions 
which create a ndarray, eg: ar = PyArray_FROM_OF(a, NPY_SIMD_ALIGNED);

Allocating a buffer aligned to a given alignment is not the problem: 
there is a posix functions to do it, and we can implement easily a 
function for the OS who do not support it. This would be done in C, not 
in python.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-04 Thread Anne Archibald
On 04/08/07, David Cournapeau [EMAIL PROTECTED] wrote:

  Here's a hack that google turned up:

I'd avoid hacks in favour of posix_memalign (which allows arbitrary
degrees of alignment. For one thing, freeing becomes a headache (you
can't free a pointer you've jiggered!).

  - Check whether a given a numpy array is simd aligned:

 /* Simple case: if aligned, use optimized func, use non optimized
 otherwise */
 int simd_func(double* in, size_t n);
 int nosimd_func(double* in, size_t n);

 if (PyArray_ISALIGNED_SIMD(a)) {
 simd_func((double *)a-data, a-size);
 } else {
 nosimd_func((double *)a-data, a-size);
 }
  - Request explicitely an aligned arrays from any PyArray_* functions
 which create a ndarray, eg: ar = PyArray_FROM_OF(a, NPY_SIMD_ALIGNED);

 Allocating a buffer aligned to a given alignment is not the problem:
 there is a posix functions to do it, and we can implement easily a
 function for the OS who do not support it. This would be done in C, not
 in python.

I'd just like to point out that PyArray_ISALIGNED_SIMD(a) can be a
macro which aligns to something like !((a-datapointer)0xf); this
avoids any change to the array objects and allows checking for
arbitrary degrees of alignment - somebody mentioned the Intel
Performance Primitives need 32-byte aligned data? One might also want
page-aligned data or data aligned in some way with cache lines.

It seems to me two things are needed:

* A mechanism for requesting numpy arrays with buffers aligned to an
arbitrary power-of-two size (basically just using posix_memalign or
some horrible hack on platforms that don't have it).

* A macro (in C, and some way to get the same information from python,
perhaps just a.ctypes.data % 16) to test for common alignment cases;
SIMD alignment and arbitrary power-of-two alignment are probably
sufficient.

Does this fail to cover any important cases?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

2007-08-03 Thread Charles R Harris
On 8/3/07, David Cournapeau [EMAIL PROTECTED] wrote:

 Andrew Straw wrote:
  Dear David,
 
  Both ideas, particularly the 2nd, would be excellent additions to numpy.
  I often use the Intel IPP (Integrated Performance Primitives) Library
  together with numpy, but I have to do all my memory allocation with the
  IPP to ensure fastest operation. I then create numpy views of the data.
  All this works brilliantly, but it would be really nice if I could
  allocate the memory directly in numpy.
 
  IPP allocates, and says it wants, 32 byte aligned memory (see, e.g.
  http://www.intel.com/support/performancetools/sb/CS-021418.htm ). Given
  that fftw3 apparently wants 16 byte aligned memory, my feeling is that,
if the effort is made, the alignment width should be specified at
  run-time, rather than hard-coded.
 I think that doing it at runtime would be overkill, no ? I was thinking
 about making it a compile option. Generally, at the ASM level, you need
 16 bytes alignment (for instructions like movaps, which takes 16 bytes
 in memory and put it in the SSE registers), this is not just fftw. Maybe
 the 32 bytes alignment is useful for cache reasons, I don't know.

 I don't think it would be difficult to implement and validate; what I
 don't know at all is the implication of this at the binary level, if any.



Here's a hack that google turned up:

(1) Use static variables instead of dynamic (stack) variables
(2) Use in-line assembly code that explicitly aligns data
(3) In C code, use *malloc* to explicitly allocate variables

Here is Intel's example of (2):

; procedure prologue
push ebp
mov esp, ebp
and ebp, -8
sub esp, 12

; procedure epilogue
add esp, 12
pop ebp
ret

Intel's example of (3), slightly modified:

double *p, *newp;
p = (double*)*malloc* ((sizeof(double)*NPTS)+4);
newp = (p+4)  (~7);

This assures that newp is 8-*byte* aligned even if p is not. However,
*malloc*() may already follow Intel's recommendation that a *32*-*byte* or
greater data structures be aligned on a *32* *byte* boundary. In that case,
increasing the requested memory by 4 bytes and computing newp are
superfluous.


Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion